HANDBOOK OF REGIONAL SCIENCE. 9783662607220, 3662607220

1,024 98 40MB

English Pages [2295] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

HANDBOOK OF REGIONAL SCIENCE.
 9783662607220, 3662607220

Table of contents :
Contents
About the Editors
About the Section Editors
Contributors
Regional Science in Perspective: Editorial Introduction
Aims and Scope
Structure of the Handbook
History of Regional Science
Location and Interaction
Regional Housing and Labor Markets
Regional Economic Growth
Innovation and Regional Economic Development
Regional Policy
New Economic Geography, Evolutionary Economic Geography, and Regional Dynamics
Environmental and Natural Resources
Spatial Analysis and Geocomputation
Spatial Statistics
Spatial Econometrics
A Content Cloud Analysis of the Handbook
Closing Remarks
Section I: History of Regional Science
Regional Science Methods in Planning
1 Introduction
1.1 The Origins of Regional Science
2 Bringing Planners on Board in the Early Days
2.1 Before Regional Science
2.2 Helping Planners to Benefit from Regional Science
2.3 A Textbook for Regional Science Methods
3 Regional Science and Penn
3.1 Isard´s Planning Ally: Robert Mitchell
4 Testing and Demonstrating Regional Models
4.1 The Philadelphia Regional Input-Output Study
4.2 The Penn-Jersey Transportation Study
4.3 Czamanski´s Regional Studies in Nova Scotia
5 Regional Science Methods and British Plan-Making
5.1 The Slow Take-Up of Social Science Among Planners
5.2 Isard´s Efforts to Establish a British Section
5.3 Brian McLoughlin´s Systems Approach to Planning
5.4 Alan Wilson´s Role in Advancing Planning Analysis
5.5 Pragmatic Practitioners
5.6 Methods of Regional Analysis Revised: A Missed Opportunity?
6 Conclusions
7 Cross-References
References
A Reconnaissance Through the History of Shift-Share Analysis
1 Introduction
2 Shift-Share Early Developments
3 Shift-Share Analysis as a Forecasting Tool
4 Finding Meaning in the Drift Component
5 Limits of Time and Dynamic Shift-Share Analysis
6 The Nation Need Not Be the Referent Area
7 The Problem of Productivity Differentials
8 Digging Deeper into Industry Shifts
9 Beyond Traditional Shift-Share Analysis: Structural Decomposition Analysis
10 Shift-Share in a Regression Framework
11 Application to Other Topics
12 Conclusions
13 Cross-References
References
City-Size Distribution: The Evolution of Theory, Evidence, and Policy in Regional Science
1 Introduction
2 Foundations: 1935-1955
2.1 Background
2.2 The Main Contributions
2.2.1 The Rank-Size Distribution
2.2.2 Random Growth
2.2.3 Central Place Theory
3 New Perspectives: 1955-1975
3.1 Background
3.2 The Main Contributions
3.2.1 Primacy
3.2.2 Location Theory
3.2.3 Hierarchy Models
3.2.4 Empirical Regularities
3.2.5 Early Simulations
4 Mature Perspectives: 1975-1995
4.1 Background
4.2 The Main Contributions
4.2.1 Deeper Empirical Investigations
4.2.2 Better Hierarchy Models
4.2.3 More Advanced Simulations
5 Summary and Conclusion
6 Cross-References
References
Classical Contributions: Von Thünen and Weber
1 Introduction
2 The Location of Activities in Space: Land Rent Formation
2.1 Accessibility and Transportation Costs
2.2 The Location of Agricultural Activities: The von Thünen Model
2.3 The Location of Activities in a City: A Partial Spatial Equilibrium Approach
2.4 The Location of Activities in a City: A General Spatial Equilibrium Approach
2.5 Critical Remarks
3 The Location of Industrial Activities in Space: Weber´s Model
3.1 Transportation Costs and Agglomeration Economies
3.2 Weber´s Model
3.3 Critical Remarks of the Model
4 Conclusions
References
Market Areas and Competing Firms: History in Perspective
1 Introduction
2 Theoretical Background
3 Theoretical Modeling of Location Choices
4 Basic Modeling Principles and Assumptions
5 Experimenting with the Hotelling Model
6 Analysis of the Simulation Results
7 Conclusions
References
Spatial Interaction Models: A Broad Historical Perspective
1 Introduction and Notation
2 Early Ideas
3 Social Physics
4 Maximum Entropy
5 Maximum Utility
6 Random Choice
7 Trade
7.1 The Specific Goods Model
7.2 The Monopolistic Competition Model
7.3 The Ricardian Model
8 Conclusion
9 Cross-References
References
Regional Science Methods in Business
1 Introduction
2 Macroscale Models
3 Mesoscale Models
4 Microscale Models
5 Conclusions
6 Cross-References
References
Worlds Lost and Found: Regional Science Contributions in Support of Collective Decision-Making for Collective Action
1 Introduction
2 Ontologies Adequate to the World in Which We Plan
2.1 Ontologizing from Within Planning
3 Viewing Actors from the Perspective of Behavioral Economics and Complexity Theory
3.1 Endogenizing Social Structure
3.1.1 Assumptions
3.1.2 Additional Notation
3.1.3 Model Development
3.1.4 A Dynamic Model of Individual Decision-Making
4 Formal Analysis of Subjunctive Reasoning
4.1 Subjunctive Reasoning in Policy Analysis
5 Causal Analysis
5.1 Example of Causal and Counterfactual Analysis in a Planning Support System
6 Modeling Collective Decision-Making in Design Behavior
7 Conclusion
8 Cross-References
References
Section II: Location and Interaction
Travel Behavior and Travel Demand
1 Introduction
2 The Behavior of Individuals
3 Modeling Travel Behavior and Demand
4 The Elasticity of Travel Demand
5 Using Travel Behavior and Travel Demand Information
6 Conclusions
References
Activity-Based Analysis
1 Introduction
2 Conceptual Foundations of Activity-Based Analysis
3 Policy and Technology Context for Activity-Based Analysis
4 Activity Data Collection and Analysis
5 Activity-Based Modeling
6 Frontiers in Activity-Based Analysis
7 Conclusion
References
Social Network Analysis
1 Introduction
2 Social Media as Social Networks
3 The Leading Social Media/Social Networking Sites
3.1 Analysis of Facebook´s Growth
3.2 Analysis of Google´s Growth
3.3 Google-Facebook Competition
4 Social Networking Sites-Print Media Competition
5 Location and Social Network Technology
5.1 Regional Contextual Implications for the Formation of Online Social Networks
5.2 Hyperlocal Marketing in Social Networks
6 Antitrust Action and Social Networks
7 Quantitative Social Network Analytics
7.1 Measures of Structure and Visualization
7.2 Procedures for Content Analysis
8 How Value Is Generated with a Social Network
8.1 Microtargeting and Fake News
9 Analyzing Fake News on Social Networks: Regional Science Implications
10 Social Networks and Social Media for Illegal Activities
11 Conclusion
12 Cross-References
References
Further Reading
Land-Use Transport Interaction Models
1 Introduction
2 Theory
3 Operational Models
3.1 Spatial Interaction Location Models
3.2 Accessibility-Based Location Models
4 Current Debates
4.1 Equilibrium or Dynamics
4.2 Macro or Micro
5 Future Challenges
6 Conclusions
References
Network Equilibrium Models for Urban Transport
1 Introduction
2 Historical Overview
3 Model Formulations
3.1 Definitions and Assumptions
3.2 Methodological Approach
3.3 Deterministic Route Choice over a Road Network
3.4 Stochastic Route Choice over a Road Network
3.5 Mode and Route Choice over Road and Fixed Cost Networks
3.6 O-D, Mode, and Route Choice over Road and Fixed Cost Networks
4 Model Solution and Implementation
4.1 Solution Algorithms
4.2 Unique Route Flows and Multi-class Link Flows
5 Conclusions
References
Supply Chains and Transportation Networks
1 Introduction
2 Fundamental Decision-Making Concepts and Models
2.1 The User-Optimized Problem
2.1.1 Transportation Network Equilibrium Conditions
2.2 The System-Optimized Problem
2.2.1 System-Optimality Conditions
2.3 The Braess Paradox
3 Models with Asymmetric Link Costs
3.1 Variational Inequality Formulations of Fixed Demand Problems
3.2 Variational Inequality Formulations of Elastic Demand Problems
4 Conclusions
5 Cross-References
References
Computable Models of Dynamic Spatial Oligopoly from the Perspective of Differential Variational Inequalities
1 Introduction
2 The Notion of a Nash Equilibrium
3 Competitive Oligopoly
3.1 Competitive Spatial Oligopoly
3.2 Variational Inequality (VI) Formulations of Spatial Oligopoly
3.3 Diagonalization Algorithm for Variational Inequality
4 Static Competitive Network Oligopoly
5 Dynamic Network Oligopoly
5.1 Notation
5.2 The Firm´s Objective Functional, Dynamics, and Constraints
5.3 The Differential Variational Inequality Formulation of Dynamic Network Oligopoly
5.4 Discrete-Time Approximation
5.5 A Comment About Path Variables
5.6 Numerical Example
5.7 Interpretation of Numerical Results
6 Conclusions
7 Cross-References
References
Complexity and Spatial Networks
1 Introduction
2 Complexity and Spatial Networks
3 Static Complexity and Models
3.1 Preface
3.2 Static Complexity and Static Models in Spatial Economic Analysis
4 Dynamic Complexity and Models
4.1 Simple Models vs. Dynamic Complexity
4.2 Less Simple Models vs. Dynamic Complexity
4.3 Concluding Remarks
5 Complexity and Network Analysis
5.1 Preface
5.2 Simple Network Models: Random and Scale-Free Networks
5.3 Concluding Remarks
6 Spatial Economics and Network Analysis: Connectivity, Emergence, and Resilience
7 Conclusions
8 Cross-References
References
Factor Mobility and Migration Models
1 Introduction
2 Welfare Impacts: The Classical View
3 Heterogeneous Skills
4 Heterogeneous Preferences and the Gravity Model
5 Dynamics of Factor Mobility
6 Migration and Agglomeration
7 Migration Externalities and the Welfare State
8 Conclusions
9 Cross-References
References
Interregional Trade: Models and Analyses
1 Introduction
2 Theories on Trade With and Without Barriers
2.1 Technological Differences: Comparative Advantage Instead of Absolute Advantage
2.2 Trade Driven by Factor Endowment Differences: The Heckscher-Ohlin Model
2.3 New Trade Theory: Economies of Scale and Love of Variety
3 Interregional Trade: Alternative Approaches
3.1 Vertical Specialization and Trade Overlap
3.2 Spatial Production Cycles
4 Interregional Trade Impacts from International Trade
5 Conclusions
6 Cross-References
References
Interregional Input-Output Models
1 Introduction
2 The Base IO Table and the Demand-Driven IO Quantity Model
3 Adding Prices Without Interaction: The Dual, Cost-Push Price Model
4 Adding Trade: The Interregional IO Table and Model
5 Adding Endogenous Consumption to the Interregional IO Model
6 Further Demo-Economic Extensions of the Interregional IO Model
7 Conclusion
8 Cross-References
Appendix: The Micro-economic Foundation of the Leontief and the Ghosh IO Model
References
Section III: Regional Housing and Labor Markets
Real Estate and Housing Markets
1 Introduction
2 What Exactly Is Real Estate?
3 Developers and the Development Process
4 A Property and Asset Market Model
5 The ``Internationalization´´ of Property and Asset Markets
6 The Housing Market
6.1 Demand, Supply, and House Price Determination
6.2 The Effect of Planning Controls on the Housing Market
6.3 Other Types of Intervention in the Housing Market
6.4 Housing Market and the Macroeconomy
6.4.1 House Prices and Consumer Expenditure
6.4.2 The Effect of Housing on the Labor Market
6.4.3 Housing Markets in the Global Economy
7 Concluding Remarks
References
Housing Choice, Residential Mobility, and Hedonic Approaches
1 Introduction
2 Residential Mobility
3 House Price Hedonics
4 Conclusions
References
Migration and Labor Market Opportunities
1 Introduction
2 Measuring Economic Opportunities
3 Unemployment Rates
4 Income, Earnings, and Wages
5 Employment Opportunities
6 Conclusions
References
Job Search Theory
1 Introduction
2 The Standard Job Search Model
2.1 Basic Formulation
2.2 Extensions
2.2.1 Violation of Assumption (v): Finite Lifetime
2.2.2 Violation of Assumption (vi): The Seeker Receives a Different Number of Offers in Each Period
2.2.3 Violation of Assumption (vii): Allowing Recall
2.2.4 Violation of Assumption (viii): On-the-job Search
2.2.5 Violation of Assumption (i): Unknown Wage Distribution
3 The Matching Function
4 Job Search and Migration
5 Conclusions and Further Avenues for Research
6 Cross-References
References
Commuting, Housing, and Labor Markets
1 Introduction
2 The Monocentric Model
3 ``Wasteful´´ Commuting
4 Transport Modes, Sorting, and Urban Sprawl
5 Density, Diversity, and Agglomeration
6 Owning, Renting, and Unemployment
7 Conclusion
Appendix Computation of the Reservation Wage
References
Labor Market Theory and Models
1 Introduction
2 Labor Market Theory
2.1 Labor Supply
2.2 Labor Demand
2.3 Labor Market Equilibrium
3 Defining Labor Market Areas
3.1 Historical Efforts
3.2 Cluster-Based Analysis
4 Labor Market Area Analyses
4.1 Explaining Differences in Labor Earnings
4.2 Models of Spatial Adjustment: Booms and Busts
5 Conclusions
References
Spatial Equilibrium in Labor Markets
1 Introduction
2 The Traditional Labor Economics View of Spatial Labor Markets
3 An Urban/Regional View of Spatial Labor Markets
4 Spatial Labor Market Equilibrium in the Urban/Regional View with Suggestions for Future Research
5 Conclusions
References
Regional Employment and Unemployment
1 Introduction
2 The Determinants of Regional Disparities in Unemployment Rates
3 A Simple Model Based on Supply and Demand
3.1 Labor Demand
3.2 Labor Supply
4 Changes in Labor Demand and Productivity
5 Changes in Industry Composition
6 Labor Supply Constraints
6.1 Human Capital and Skills
6.2 Demographic Factors
6.2.1 Age and Gender Differences
6.3 Barriers to Labor Mobility
7 Policy Constraints
7.1 Social Insurance
7.2 Place-Based Policies That Limit Mobility
8 Conclusions
References
Spatial Mismatch, Poverty, and Vulnerable Populations
1 Introduction
2 The Theory of Spatial Mismatch
3 The Empirical Tests of Spatial Mismatch
4 Local Policies to Reduce Poverty
5 Conclusions
6 Cross-References
References
Section IV: Regional Economic Growth
Neoclassical Regional Growth Models
1 Introduction
2 The Solow-Swan Model
2.1 Neoclassical Production Function
2.2 Capital Accumulation Equation
2.3 Labor Supply Function
2.4 Solow Diagram and Dynamics of the Model
3 Extensions of the Solow-Swan Model
3.1 Technological Progress
3.2 Human Capital
3.3 Migration
4 The Ramsey-Cass-Koopmans Model
4.1 Households
4.2 Firms
4.3 Equilibrium and Transitional Dynamics
4.4 Extensions of the Ramsey-Cass-Koopmans Model
5 Conclusions
References
Endogenous Growth Theory and Regional Extensions
1 Introduction
2 Modern Growth Theory
2.1 Empirical Findings on Growth
2.2 Modeling R&D as the Source of Growth
2.3 Modeling Education as the Source of Growth
2.4 Concluding Remarks
3 The Geography of Knowledge Creation and Diffusion
3.1 Clusters and Agglomeration
3.2 Knowledge Spillovers and the Role of Cities
3.3 Concluding Remarks
4 The Organization of Knowledge Creation and Diffusion
4.1 Organizational Change as the Enabler of Economic Growth
4.2 Entrepreneurship as the Conduit for Knowledge Spillovers
4.3 Concluding Remarks
5 Fundamental Causes of Growth and Development
5.1 Institutions and Economic Growth
5.2 Concluding Remarks
6 Conclusions
7 Cross-References
References
Incorporating Space in the Theory of Endogenous Growth: Contributions from the New Economic Geography
1 Introduction
2 A Simple Model of Endogenous Growth
2.1 Demand
2.2 Production
2.3 Research and Development
2.4 Equilibrium
2.5 Balanced Growth
3 A Two-Region Model of Growth
3.1 Incorporating Space in the Theory of Growth
3.2 Model Description
3.3 Long-Run Location
4 Spatial Consequences for Economic Growth
4.1 Integration
4.2 Growth in Varieties
4.3 Consumption Growth
4.4 Agglomeration, Freeness of Trade, and Economic Growth
4.5 Impact of Knowledge Spillovers upon Economic Growth
5 Variations to Incorporating Space in the Theory of Growth
5.1 Mobility of Labor and Capital
5.2 Vertically Linked Industry
5.3 Assumptions About Scale Effects
5.4 Other Characteristics
6 Conclusions
7 Cross-References
References
Demand-Driven Theories and Models of Regional Growth
1 Introduction
2 Taxonomy of Theoretical Perspectives on Long-Run Growth
3 Exogenous Demand Determined Regional Expansion
4 The Kaldor-Dixon-Thirlwall Model
5 The Benefits of High Wages and Public Expenditure for Growth
6 Institutional Theories of Regional Growth
7 Conclusions
References
Regional Growth and Convergence Empirics
1 Introduction
2 Growth Regressions: From Theory to Empirics
3 Estimating the Rate of Convergence
3.1 Unconditional and Conditional β-Convergence
3.2 Space and Growth
3.3 Econometric Issues
3.3.1 Endogeneity of Explanatory Variables
3.3.2 Robustness of Explanatory Variables
3.4 Panel Estimation
3.5 Multiple Regimes and Convergence Clubs
4 Sigma-Convergence and Distribution Approach to Convergence
4.1 σ-Convergence
4.2 Studying the Evolution of the Cross-Sectional Distributions
4.3 Distribution Dynamics and Space
5 Conclusion
6 Cross-References
References
The Rise of Skills: Human Capital, the Creative Class, and Regional Development
1 Introduction
2 The Firm/Industry Focus in Regional Research
3 Postindustrialism and the Knowledge Economy
4 Human Capital
5 Occupations and the Creative Class
6 The Organizing Role of Place
7 Conclusions
References
Local Social Capital and Regional Development
1 Introduction
2 What Is (Local) Social Capital?
2.1 Local Social Capital
2.2 Bonding and Bridging Social Capital
3 Social Capital and Regional Development: Theory
3.1 The Entrepreneurship Channel
3.2 The Institutional Channel
4 Social Capital and Regional Development: Evidence
5 Concluding Remarks
6 Cross-References
References
Population Diversity and Regional Economic Growth
1 Introduction
2 Diversity, Productivity, and Growth - Theoretical Arguments
3 Measurement of Population Diversity
4 Identification of Diversity Effects
5 Empirical Evidence
5.1 Regional Analyses
5.1.1 Productivity
5.1.2 Public Goods
5.1.3 New Firm Formation
5.1.4 Innovation
5.2 Multilevel and Firm-Level Analyses
6 Conclusions
7 Cross-References
References
Infrastructure and Regional Economic Growth
1 Introduction
2 Terminology: Infrastructure, Regions, and Growth
3 Infrastructure and Growth: A Spatial Equilibrium Model
4 Causality and Spatial Spillovers
5 Infrastructure as a Real Option
6 Conclusions
7 Cross-References
References
Wellbeing and the Region
1 Introduction
2 Wellbeing
3 Conceptual Frameworks
3.1 The Interaction of Wellbeing and Place
3.2 The Ecology of Development
4 Estimating Context Effects
4.1 The Multilevel Model
5 Applications
5.1 The Urban Paradox
5.2 Wellbeing Inequality
5.3 Wellbeing and Migration
5.4 The Temporal and Spatial Dynamics of Wellbeing
6 Conclusions
7 Cross-References
References
Spatial Policy for Growth and Equity: A Forward Look
1 Introduction
2 Growth Theory and Regional Development Policy
3 Empirical Evidence and Lessons Learned in the EU and the USA
3.1 European Regional Policy
3.2 Regional Policy in the USA
4 Looking Ahead
4.1 Including Spatial Dependence and Reporting the Right Measurements
4.2 Measuring the Actual Investments, Not Proxies
4.3 Combining Different Strands of Theory and Techniques
5 Conclusion
6 Cross-References
References
Section V: Innovation and Regional Economic Development
The Geography of Innovation
1 Introduction
2 The Standard Model: Innovation as Knowledge Production
3 What Is Innovation?
4 Two Faces: The Multidimensional Nature of Innovation
5 How Innovation Works: Linear or Complex?
5.1 Entrepreneurship as Innovation
5.2 Innovation as a Complex Process
6 Global R&D
6.1 Fact 1: Innovation Is Dispersing Globally
6.2 Fact 2: Places Are Competing for Talent and Brains
6.3 Keeping Track
7 Policy: Changing and Reacting to Changes in the Geography of Innovation
7.1 Regional Innovation Policy: Constructing Advantage
7.2 Policy in a World of Global Production Networks and Global Value Chains
8 Conclusions
9 Cross-References
References
Systems of Innovation and the Learning Region
1 Introduction
2 Regional Governance and Learning
3 Regional Governance and Policy Learning
4 The Learning Region
5 Regional Platforms, Methods, and New Innovation Policies
6 Regional Innovation Systems Version 3.0: Learning Dilemmas
7 Coevolution
8 Complexity
9 Emergence
10 Policy Emergence and Learning
11 Conclusions
References
Networks in the Innovation Process
1 Introduction
2 Networks and Associated Concepts
3 Knowledge Networks in a Knowledge-Based Economy
4 Innovation Networks and Different Types of Proximity
5 Innovation Networks: Some Methodological Approaches
6 Conclusions
7 Cross References
References
The Geography of R&D Collaboration Networks
1 Introduction
2 The Networked Character of R&D
2.1 What Are R&D Collaboration Networks?
2.2 The EU Framework Programme for Supporting R&D Collaboration Networks
3 R&D Collaboration Networks in a Regional Context
3.1 Different Types of R&D Networks, and Some Empirical Lenses on Them
3.2 Properties of R&D Collaboration Networks
3.3 Visualization of Cross-Region R&D Collaboration Networks
3.4 Explaining Differing Intensities of Cross-Region Collaboration
4 Conclusions
5 Cross-References
References
Generation and Diffusion of Innovation
1 Introduction
2 Innovations and Heterogeneity of Firms and Places
3 Innovations, Diffusion, and Technological Development
3.1 Renewal Activities and Firm Performance
3.2 Characteristics and Performance of Firms
3.3 Proximity and Networks
4 Innovation, Regional Milieu, and Networks
4.1 A Functional Region Is an Arena for Face-to-Face Contacts
4.2 Urbanization and Localization
4.3 Accessibility of Knowledge Sources
5 Diffusion of Ideas and Technical Solutions
5.1 The Diffusion Model
5.2 Technology Diffusion and R&D Spillovers
5.3 Innovation Ideas and New Products
6 Conclusions
References
Cities, Knowledge, and Innovation
1 Introduction
2 Knowledge Creation and Diffusion
3 Mechanisms of Knowledge Production and Diffusion in Cities
4 Agglomeration, Variety, and Pecuniary External Effects
5 Knowledge Spillovers in the Urban Agglomeration Literature
6 Conclusions
7 Cross-References
References
Knowledge Flows, Knowledge Externalities, and Regional Economic Development
1 Introduction
2 Nature of Knowledge and Knowledge Flows
3 Aspects of Knowledge Externalities
3.1 Economic Nature of Knowledge Externalities
3.2 Sources of Knowledge Externalities
3.3 Recipients of Knowledge Externalities
3.4 Mechanisms of Knowledge Externalities
3.5 Geographic Reach of Knowledge Externalities
3.6 Consequences of Knowledge Externalities
4 Knowledge Externalities and Regional Economic Development
4.1 New Economic Geography Models and Knowledge-Based Regional Growth
4.2 Interregional Knowledge Flows and Multiregional Growth
4.3 Microeconomic Aspects of Knowledge Spillovers, Knowledge Externalities, and Regional Economic Development
4.4 Entrepreneurship, Clusters, and Regional Economic Development
5 Knowledge Externalities and Regional Economic Development: Policy Conclusions
6 Conclusions
7 Cross-References
References
Knowledge Spillovers Informed by Network Theory and Social Network Analysis
1 Introduction
2 Knowledge Spillovers
3 Application of Network Theory to Knowledge Spillover Research
4 Conclusions and Future Research
5 Cross-References
References
Clusters, Local Districts, and Innovative Milieux
1 Introduction
2 Core Concepts
2.1 The Marshallian Concept
2.2 The Neo-Marshallian Industrial District Concept
2.3 The Innovative Milieu Concept
2.4 The Industrial Cluster Concept
3 Similarities and Dissimilarities
3.1 Geographies and Space
3.2 Actors and Interactions
3.3 Industries and Innovation
3.4 Environment, Competition, and Cooperation
4 Conclusions
References
Section VI: Regional Policy
The Case for Regional Policy: Design and Evaluation
1 Introduction
2 The Case for Regional Policies: Economic Considerations
3 The Case for Regional Policies: Noneconomic Reasons
4 What Sort of Regional Policy?
5 How to Assess the Effectiveness of Regional Policies
6 Conclusions
7 Cross-References
References
Further Readings
Place-Based Policies: Principles and Developing Country Applications
1 Introduction
2 Place-Based Policies: The Spatial Context
2.1 Proximity, Connectivity, and Clusters
2.2 Complementarities and Coordination
2.3 Adjustment Mechanisms
2.4 Spatial Equilibrium
3 Place-Based Policies: Quantity Effects and Valuation
3.1 Quantity Effects
3.2 Valuation: Market Failure
3.3 Policy Targeting
4 Place-Based Policies: Zones, Corridors, and Regions
4.1 Special Economic Zones
4.2 Corridors and Long-Distance Transport Infrastructure
4.3 Lagging Regions and Big-Push Policies
4.4 Local Economic Development Policies
5 Concluding Comments
6 Cross-References
References
New Trends in Regional Policy: Place-Based Component and Structural Policies
1 Introduction: Regional Disparities and Place-Based Policies
2 Three Main Regional Scales and Their Characteristics
2.1 Tertiarization of Economies and Competitive Edge of Large Cities
2.2 Rural Regions Close to Cities and Agglomeration Effects
2.3 Heterogeneity of Remote Rural Regions and Tradable Activities
3 Differentiated Regional and Urban Policy Responses
3.1 Policy Responses in Cities
3.2 Policy Responses in Rural Areas Close to Cities
3.3 Policy Responses in Remote Rural Regions
4 Conclusion: An Evolving Regional Policy Paradigm
5 Cross-References
Annex: A Typology to Classify TL3 Regions
References
Further Reading (OECD References)
European Regional Policy: What Can Be Learned
1 Introduction
2 When and Why Was the European Regional Policy Implemented?
2.1 A Long Period Without an EU Regional Policy
2.2 Factors Driving the Implementation of a European Regional Policy
2.2.1 Essentially Political Arguments and Reasons Linked to the Incorporation of New Countries and the Goal of Accelerating th...
2.2.2 Economic and Political Arguments and the Lack of a Theoretical Basis
3 The Bases of the New Regional Policy and Their Implementation Through Multiannual Programs
3.1 An Important Framework: Decision on the Essential Principles of the New Policy
3.2 Key Aspects of the First Two Multiannual Programs
3.3 A Balance of the Implementation and Results of the First Phase of the European Regional Policy: 1989-1999
4 Changes in European Regional Policy During the 2000-2020 Period
4.1 The 2000-2006 and 2007-2013 Programs
4.2 The 2014-2020 Program: Key Aspects and Critical Elements
5 What Can Be Learned? An Evaluation of Achievements, Weaknesses, and Perspectives of the European Regional Policy
5.1 Positive Aspects of the European Regional Policy or Cohesion Policy
5.2 Criticisms and Some Concerning Topics
5.3 Positive Factors for Regional Development and Greater European Regional Policy Effectiveness
5.4 Future Perspectives
6 Conclusions
7 Cross-References
References
Further Readings
Regional Policies in East-Central Europe
1 Introduction
2 East-Central Europe Before Accession to the EU
2.1 The New Beginnings
3 Post-socialist Transformation
3.1 Regional Patterns of Transformation
3.2 Regional Policies Before Accession to the European Union
3.3 Decentralization
3.4 Preparation for Accession
3.5 Equity or Efficiency?
4 Regional Policies in the East-Central European Countries After Accession
4.1 The First Period (2004-2006)
4.2 Regional Policies in the E-CE Countries After 2007
4.3 Models of Regional Policies
4.4 The Effects
5 The Future
6 Conclusions
7 Cross-References
References
Special Economic Zones and the Political Economy of Place-Based Policies
1 Introduction
2 A History of the Special Economic Zone
3 Special Economic Zones: A Place in Search of a Theory
3.1 A Policy in Search of a Place (and a Theory)
3.2 The Institutional Angle
4 What Have SEZs Achieved?
4.1 Increased Foreign Investment
4.2 Employment Creation
4.3 Supporting Policy Reforms
5 Conclusion
6 Cross-References
References
Further Reading
Rural Development Policies at Stake: Structural Changes and Target Evolutions During the Last 50 Years
1 Introduction
2 On Rural and Rural Policies
3 Evolution of Rural Development Patterns in the Long Run: Successive Attempts to Deal with the Diversity and Sustainability o...
4 Rural Development Policies: Big Differences According to the Major Regions of the World
4.1 USA and Australia: The Primacy Given to Agricultural Development Policies
4.2 China Ex-Communist, the Persistence of a Rural in the Service of Economic Development
4.3 Europe, an Unfinished Quest for Territorial Cohesion and Sustainability of the Agricultural Sector
4.4 Japan, a Residual Rural Policy
4.5 Brazil, a Dual Strategy Between the Competitiveness of Agro-Industrial Sectors and the Structuring of Territorial Approach...
4.6 What Impact of Rural Development Policies?
5 Conclusion
6 Cross-References
References
Section VII: New Economic Geography, Evolutionary Economic Geography, and Regional Dynamics
Schools of Thought on Economic Geography, Institutions, and Development
1 Introduction
2 New Economic Geography
3 The New Urban Agenda
4 The Evolutionary and Institutional School
5 Conclusions
6 Cross-References
References
Further Reading
New Economic Geography and the City
1 Introduction
2 Cities and Tradable Goods
2.1 The Spatial Economy
2.1.1 Consumers
2.1.2 Land Market and Urban Costs
2.1.3 Producers
2.1.4 Labor Market and Wages
2.2 The Formation of Manufacturing Cities
2.3 Extensions
2.3.1 Agents´ Heterogeneity
2.3.2 The Impact of Local Policies
2.3.3 Technological Progress in Manufacturing
2.3.4 The Bell-Shaped Curve of Spatial Development
2.4 The Decentralization of Jobs Within Cities
2.4.1 Polycentric Cities
2.4.2 The Emergence of Polycentric Cities
3 Cities and Services
3.1 The Formation of Consumption Cities
3.2 Extensions
3.2.1 Spatial Externalities
3.2.2 Exogenous Amenities
4 The Size and Industrial Structure of Cities
4.1 Manufactured Goods and Services
4.1.1 The Export Base Theory Revisited
4.1.2 Urban Hierarchy
4.1.3 Relative Comparative Advantage
4.2 The Impact of Trade on the Structure of Cities
5 The Future of Cities
5.1 Urban Crime
5.1.1 Criminal Activity and City Size
5.1.2 The Emergence of Large Cities with High Criminal Activity
5.2 Are Compact Cities Ecologically Desirable?
5.2.1 The Ecological Trade-Off Between Commuting and Transport Costs
5.2.2 Does the Market Yield a Good, or a Bad, Ecological Outcome?
5.3 Cities in Aging Nations
5.4 Do New Information and Communication Technologies Foster the Death of Cities?
6 Conclusion
7 Cross-References
References
New Economic Geography After 30 Years
1 Introduction
2 Increasing Returns and Intra-industry Trade
2.1 Basic Ingredients of the Model
2.1.1 Demand
2.1.2 Supply
2.1.3 Equilibrium
2.1.4 International Trade
2.2 The Home Market Effect as a Volume Effect
2.3 The Home Market Effect as a Factor Price Effect
3 Adding Labor Mobility
3.1 Core New Economic Geography Model
3.2 Model Extensions
3.2.1 Intermediate Goods
3.2.2 Factors of Production
3.2.3 Heterogeneous Firms
4 Empirical Testing
5 Policy Consequences
6 Conclusions
7 Cross-References
References
Further Reading
Relational and Evolutionary Economic Geography
1 Introduction
2 Segmented Cluster Paradigms
3 Network Relations and the Knowledge-Based Conception of Clusters
4 Regional Path Dependence and Cluster Life Cycles
5 Toward an Integrated Relational-Evolutionary Model of Cluster Dynamics
6 Conclusions
7 Cross-References
References
Path Dependence and the Spatial Economy: Putting History in Its Place
1 Introduction
2 Path Dependence as Self-Reinforcing Spatial Economic ``Lock-In´´: What Does It Mean and How Common Is It?
3 Rethinking Path Dependence: From ``Lock-In´´ to Ongoing Path Evolution
4 Toward a ``Developmental-Evolutionary´´ Model of Path Dependence in the Spatial Economy
5 Conclusion
6 Cross-References
References
Agglomeration and Jobs
1 Introduction
2 Cities, Worker Productivity, and Wages
3 Firm Dynamics Within Cities
4 City Functionality, Urban Systems, and Policies
5 Conclusions
6 Cross-References
References
Changes in Economic Geography Theory and the Dynamics of Technological Change
1 Introduction
2 The Linear Model of Innovation: The A-Spatial Benchmark
3 Physical ``Distance´´ Between Innovative Agents and Knowledge Flows
4 Innovative Agents ``in Context´´: Local Specialization Patterns and Institutions
4.1 Economic Places: Industrial Specialization
4.2 Relational-Institutional Places
5 Bringing Different Approaches Together (1): Non-spatial Proximities, ``Integrated´´ Frameworks, and Local-Global Connectivity
6 Bringing Different Approaches Together (2): Local-Global Connectivity
7 Conclusions
8 Cross-References
References
Geographical Economics and Policy
1 Introduction
2 Empirical Analysis: Data
3 Empirical Analysis: Causality
4 Policy Evaluation
5 Conclusions
6 Cross-References
References
New Geography Location Analytics
1 Introduction
2 Analytics Overview
3 Service Allocation
4 Spatial Optimization
5 Location Modeling
5.1 Transportation
5.2 Congestion
5.3 Dispersion
5.4 Economies of Scale
6 Conclusions
7 Cross-References
References
Further Reading
Geography and Entrepreneurship
1 Introduction
2 Why Study the Geography of Entrepreneurship?
3 Entrepreneurs and Their Support Systems
3.1 Entrepreneurship: Individual Characteristics
3.2 Entrepreneurship: Demand Factors
3.3 Entrepreneurship: Supply Factors
3.4 Social but Real: Local Informal Institutions
3.5 Local Formal Institutions: Permits and Politics
3.6 The ``Correct´´ Spatial Level of Observation
4 Concluding Remarks
5 Cross-References
References
The New Economic Geography in the Context of Migration
1 Introduction
2 Migration in New Economic Geography Models
2.1 Putting New Economic Geography Models into Practice: Examples
2.2 Does the New Economic Geography Realistically Portray Migration?
3 Conclusion
4 Cross-References
References
Further Reading
Environmental Conditions and New Geography: The Case of Climate Change Impact on Agriculture and Transportation
1 Introduction
2 Impact on Agriculture
2.1 Current State of the Literature
2.2 Challenges Ahead
3 Impact on the Transportation System
3.1 Literature Review
3.2 Climate Change Impact and Resilience
4 Conclusion
5 Cross-References
References
Social Capital, Innovativeness, and the New Economic Geography
1 Introduction
2 Social Capital and Innovativeness
3 Social Capital, Innovativeness, and the New Economic Geography
3.1 A Short Introduction to New Economic Geography
3.2 New Economic Geography and Knowledge-Based Growth
3.3 New Economic Geography, Social Capital, and Innovativeness
4 Conclusion
5 Cross-References
References
Section VIII: Environment and Natural Resources
Spatial Environmental and Natural Resource Economics
1 Introduction
2 Spatial Heterogeneity and Optimal Policy
2.1 Spatial Heterogeneity in Conservation
2.2 Effect of Space on Market-Based Solutions
3 Spatial Elements of Nonmarket Valuation
3.1 Hedonic Valuation
3.2 Travel-Cost Analysis
3.3 Stated Preference Valuation Techniques
4 Spatial Empirical Identification Strategies
4.1 Environment and Health
4.2 Evaluations of Protected Areas and Payment for Environmental Services Programs
5 Models of Behavior in Space
5.1 Spatial Sorting Models
5.2 Behavior in Resource Use and Conservation
6 Conclusions
7 Cross-References
References
Further Reading
Dynamic and Stochastic Analysis of Environmental and Natural Resources
1 Introduction
2 The Canonical Resource Management Model
3 Resource Management Under Uncertainty
3.1 Uncertain T
3.1.1 Ignorance Uncertainty
3.1.2 Exogenous Uncertainty
3.2 Stochastic Stock Dynamics
3.3 Discounting
3.4 Instantaneous Benefit
3.5 Post-Planning Value
3.6 Compound Uncertainties
4 Integrating Natural Resources and Aggregate Growth Models
4.1 An Integrated Model
4.2 Uncertainty in the Integrated Model
5 Irreversibility and Uncertainty
6 Knightian Uncertainty
7 Conclusions
References
Game Theoretic Modeling in Environmental and Resource Economics
1 Introduction
2 Static Games
2.1 The Emissions Game
2.2 Sustaining Cooperation in a Noncooperative World
2.2.1 Stage 2: The Emissions Game Revisited
2.2.2 Stage 1: The IEA Game
3 Dynamic Games: Some Concepts
4 Transboundary Stock Pollutants
4.1 A Benchmark Model
4.2 Centralized Versus Regional Control of Pollution
5 Provision of Clean Air and Interregional Mobility of Capital
6 Conclusion
References
Methods of Environmental Valuation
1 Introduction
2 Benefit Measures
2.1 Use Values
2.2 Nonuse or Passive Use Values
3 Overview of Methods and How They Relate to Values
4 Hedonic Property Method
4.1 Economic Theory Underlying the Hedonic Property Method
4.2 Data Requirements
4.3 Econometric Modeling Including Spatial Dimensions
5 Travel Cost Models
5.1 Single-Site Trip Frequency Models of Recreation Demand
5.2 Valuing Quality Changes Using Trip Frequency Models
5.3 Multiple-Site Selection Models
5.4 Data Requirements for Travel Cost Models
6 Stated Preference Models
6.1 Contingent Valuation Method
6.2 Choice Experiments
6.3 The Issue of Bias in Stated Preference Surveys
7 Combining Stated and Revealed Preference Methods and Data
8 Benefit Transfer
9 Conclusion
10 Cross-References
References
The Hedonic Method for Valuing Environmental Policies and Quality
1 Introduction
2 Value of Statistical Life
3 Hedonic Valuation of Environmental Quality
4 Wage Compensation for Environmental Amenities
5 Property Value Compensation for Environmental Amenities
6 Wage and Property Value Hedonics Are Not Alternatives: The Multimarket Hedonic Method
7 What if Single-Market Hedonic Analyses Are Employed Rather than Multimarket Analyses?
8 Conclusion
References
Materials Balance Models
1 Introduction
1.1 Material Balance by Total Mass
1.2 Material Balance by Element
2 Applications of Material Flow Analysis
2.1 Studying Flows of Substances, Materials, and Products
2.2 Studying Firms, Sectors, and Geographical Areas
3 Case Studies
3.1 Material Balance Applied to Chemical Industry
3.1.1 Inorganic Chemicals
Sulfuric Acid
Ammonia
Chlorine and Sodium Hydroxide
Results
3.1.2 Organic Chemicals
3.2 Mass Balance Applied to Rare Earth Metals
3.2.1 Dopants for Semiconductor in Phosphors
3.2.2 Catalysts
Fluid Catalytic Cracking (FCC)
Autocatalyst Converters
3.2.3 Electrical Storage in NiMH Batteries
3.2.4 Alloying Elements
Wind Turbines
Electric Vehicles
Magnetic Resonance Imaging (MRI)
Minor Alloys
3.2.5 Additives
Glass Industry
Ceramic Industry
3.2.6 Abrasive
3.2.7 Results of MFA of Rare Earths
4 The Laws of Thermodynamics
5 Conclusions
References
Climate Change and Regional Impacts
1 Introduction
2 Expected Impacts Based on Level of Urbanization
2.1 Density-Dependent Impacts
2.2 Agriculture and Forests Impact
2.3 Natural Landscapes
3 Regional Differences in Risk and Mitigation Capacity
3.1 North America
3.2 Europe
3.3 Asia
3.4 Latin America
3.5 Africa
3.6 Australia and New Zealand
4 Interconnectivity and Reach of Impacts
5 Global Social Justice
6 Conclusion
Appendix
References
Urban and Regional Sustainability
1 Introduction
2 Principles
3 Sustainable Cities and Regions
4 Implementing Sustainability Goals
5 Conclusion
6 Cross-References
References
Population and the Environment
1 Introduction
2 Century of Dramatic Population Growth and Population-Environment Theories
3 Regional Differentials
4 Micro Perspective on Population Decisions and Community Resilience and High-Level Effects
5 Conclusions
6 Cross-References
References
Section IX: Spatial Analysis and Geocomputation
Geographic Information Science
1 Introduction
2 Principles of GIScience
2.1 The Characteristics of Geographic Information
2.2 Dealing with Large Data Volumes
2.3 Scale-Related Issues
2.4 Simulation in GIScience
2.5 Achievements of GIScience
3 Changing Practice and Changing Problems
3.1 CyberGIS and Parallel Processing
3.2 The Social Context of GISystems
3.3 GIScience and Data Science
3.4 Neogeography, Wikification, Open Data, and Consumer Data
4 Conclusion
References
Geospatial Analysis and Geocomputation: Concepts and Modeling Tools
1 Introduction
2 Geocomputation and Spatial Analysis
3 Geocomputational Models Inspired by Biological Analogies
4 Networks, Tracks, and Distance Computation
5 Computational Spatial Statistics
6 Conclusions
References
Bayesian Spatial Analysis
1 Introduction
2 Kinds of Spatial Data
3 Bayesian Approaches for Point Data
4 A Roughness-Based Prior
5 Bayesian Approaches for Point-Based Measurement Data
6 Region-Based Measurement Data
7 Conclusions
References
Geovisualization
1 Introduction
2 Statistical Graphics
2.1 Histograms
2.2 Box Plots
2.3 Scatterplots
2.4 Parallel Coordinate Plots
3 Choropleth Maps
3.1 Color
3.2 Class Intervals
4 Exploratory Data Analysis
5 Exploring Time
5.1 Animation
5.2 Space-Time Cube
6 Conclusion
References
Web-Based Tools for Exploratory Spatial Data Analysis
1 Introduction
2 Exploratory Spatial Data Analysis
3 Exploratory Spatial Temporal Data Analysis
4 The Evolution of Tools for Exploratory Data Analysis/Exploratory Spatial Temporal Data Analysis
4.1 Early Desktop Applications of ESDA/ESTDA
4.2 Coupling or Extending GIS with ESDA/ESTDA Functionality
4.3 Developments in R Related to ESDA/ESTDA
5 Early Web Developments in ESDA
6 The Move Toward Web-Based Mapping
6.1 Open Source Software and Open Data
6.2 Movement Toward a Service Oriented Architecture
6.3 The Emergence of Commercial Web Mapping Services
6.4 Key Developments in Python and PySal
7 Web-Based Tools for ESDA/ESTDA
7.1 Web Mapping Platforms and Tools with Little ESDA/ESTDA Functionality
7.2 Online Versions of GIS Software
7.3 Building Customized Web Applications
7.4 GeoDa-Web
7.5 CyberGIS
8 Conclusion
9 Cross-References
References
Further Reading
Spatio-temporal Data Mining
1 Introduction
2 Spatio-temporal Dependence
2.1 Statistical Approaches
2.2 Data Mining Approaches
2.3 Summary
3 Space-Time Forecasting and Prediction
3.1 Statistical (Parametric) Models
3.1.1 Space-Time Autoregressive Integrated Moving Average
3.1.2 Spatial Panel Data Models
3.1.3 Space-Time Geographically Weighted Regression
3.1.4 Space-Time Geostatistics
3.2 Machine Learning (Nonparametric) Approaches
3.2.1 Artificial Neural Networks
3.2.2 Kernel Methods
3.3 Summary
4 Space-Time Clustering
4.1 Partitioning and Model-Based Clustering
4.2 Hierarchical Clustering
4.3 Density- and Grid-Based Clustering
4.4 Statistical Tests for Clustering
4.5 Summary
5 Spatio-temporal Visualization
5.1 Two-Dimensional Maps
5.2 Three-Dimensional Visualization
5.3 Animated Maps
5.4 Visual Analytics: The Current Visualization Trend
5.5 Summary
6 Conclusion
7 Cross-References
References
Scale, Aggregation, and the Modifiable Areal Unit Problem
1 Introduction
2 MAUP Definitions
2.1 The Scale Effect
2.2 The Zonation Effect
3 Approaches to Understanding
3.1 From Univariate Statistics to Spatial Models
3.2 The Importance of Spatial Autocorrelation
3.3 Exploring the MAUP Through Zone Design
4 Conclusion
5 Cross-References
References
Spatial Network Analysis
1 Introduction
2 Spatial Networks and Graphs
2.1 Basic Definitions
2.2 Vertex Degree, Graph Density, and Local Clustering
2.3 Spatial Embedding and Planarity
2.4 Shortest Paths, Distances, and Network Efficiencies
2.5 Acyclic Graphs
3 Higher Order Structure in Networks
3.1 Network Centrality
3.2 Community detection, Modules, and Subgraphs
3.3 Structural Equivalence
4 Generating Networks: Spatial Network Models
4.1 Spatial Networks from Point Patterns
4.2 Spatial Small Worlds
4.3 Growing Spatial Networks: Preferential Attachment
4.4 Dual Graphs: New Graphs from Old
5 Matrix and Adjacency List Representation of Graphs
6 Properties of Real-World Spatial Networks
6.1 Road Networks
6.2 Transport Networks
6.3 Other Spatially Embedded Networks
6.4 Network Routing
7 Conclusions
8 Cross-References
References
Cellular Automata and Agent-Based Models
1 Introduction
2 Complexity and Models of Complexity
3 Cellular Automata: Origins
4 Cellular Automata: Key Contributions
5 Cellular Automata: Applications
6 Agent-Based Models: Origins
7 Agent-Based Models: Key Contributions
8 Agent-Based Models: Applications
9 Conclusion
References
Spatial Microsimulation
1 Introduction
2 Defining Spatial Microsimulation
3 Static and Dynamic
4 Microsimulation as a Tool for Policy
5 Spatial Microsimulation Algorithms
5.1 Deterministic Reweighting
5.2 Conditional Probabilities
5.3 Simulated Annealing
6 Which Algorithm?
7 Case Study: Estimating Smoking Prevalence Locally
8 Conclusions
9 Cross-References
References
Fuzzy Modeling in Spatial Analysis
1 Introduction
2 Fundamental Concepts and Operations of Fuzzy Set Theory
2.1 Basic Definitions and Operations
2.2 Fuzzy Relations
2.3 Linguistic Variable and Possibility Distribution
3 Spatial Data Clustering
3.1 Problems of the Two-Valued Logic Approach to Regionalization
3.2 Representation of a Region
3.3 The Core and Boundary of a Region
3.4 Regional Assignment Procedure
3.5 Grouping via Fuzzy Relation and Fuzzy Clustering
3.5.1 Grouping Based on the Concept of Similarity
3.5.2 Grouping for Regions via Fuzzy c-Means Clustering
4 Spatial Optimization Problems
4.1 Optimization of a Precise Objective Function over a Fuzzy Constraint Set
4.2 Optimization of a Fuzzy Objective Function over a Fuzzy Constraint Set
5 Conclusion
6 Cross-References
References
Section X: Spatial Statistics
Geostatistical Models and Spatial Interpolation
1 Introduction
2 Characterizing Spatial Variation
2.1 The Experimental Variogram
2.2 Variogram Model Fitting
2.3 Example
3 Spatial Interpolation with Kriging
3.1 Simple and Ordinary Kriging
3.2 Simulation
4 The Change of Support Problem
4.1 Regularization
4.2 Variogram Deconvolution
4.3 Area-to-Point Kriging
5 Geostatistics and Regional Science
6 Conclusions
References
Spatial Sampling
1 Introduction
1.1 Context
1.2 One-Dimensional Sampling
1.3 Two-Dimensional Sampling
2 Geostatistical Sampling
2.1 Designs for Variogram Estimation
2.2 Optimal Designs to Minimize the Kriging Variance
2.3 Sampling in a Multivariate Context
3 Second-Phase Sampling
4 Numerical Example
5 Search Strategies
6 Conclusion
7 Cross-References
References
Exploratory Spatial Data Analysis
1 Introduction
2 Types of Spatial Data
3 Basic Visualization and Exploration Techniques via Maps
3.1 Choropleth Maps
3.2 Linked Micromap Plots
3.3 Conditioned Choropleth Maps
4 ESDA via Linking and Brushing
5 Local Indicators of Spatial Association (LISA)
6 Software for ESDA
6.1 ESDA and GIS
6.2 Stand-Alone Software for ESDA
7 Conclusions
8 Cross-References
References
Further Reading
Spatial Autocorrelation and Moran Eigenvector Spatial Filtering
1 Introduction
1.1 Spatial Autocorrelation: A Conceptual Overview
1.2 Spatial Filtering: A Conceptual Overview
2 Moran Eigenvector Spatial Filtering
2.1 Eigenvectors of a Spatial Weight Matrix
2.2 An Eigenvector Spatial Filter Specification of the Linear Regression Model
2.3 The Numerical Calculation of Eigenvectors
2.3.1 R Code for Generating MCM Eigenvectors
2.3.2 Matlab Code for Generating MCM Eigenvectors
2.3.3 MINITAB Code for Generating MCM Eigenvectors
2.3.4 FORTRAN Code for Generating MCM Eigenvectors
2.3.5 SAS Code for Generating MCM Eigenvectors
2.4 Statistical Features of Moran Eigenvector Spatial Filtering
3 Comparisons with the Spatial Lag Term
4 An Empirical Example: An Moran Eigenvector Spatial Filtering Methodology Example
5 Extensions to Spatial Interaction Data Analysis
6 Extensions to Space-Time Data Analysis
7 Conclusions
8 Cross-References
Appendix A
References
Geographically Weighted Regression
1 Introduction
2 Model Specification
3 Model Estimation
4 Implementation
5 Issues
5.1 Statistical Inference
5.2 Collinearity
6 Diagnostic Tools
7 Extensions
8 Alternatives
8.1 Bayesian Spatially Varying Coefficient Models
8.2 Local Weighted Quantile Sum Regression
9 Application: Residential Chlordane Exposure in Los Angeles County
10 Conclusions
11 Cross-References
References
Bayesian Modeling of Spatial Data
1 Introduction
2 Spatially Autoregressive Regression
3 Disease Mapping: Conditional Autoregressive Priors in Spatial Epidemiology
4 Models for Point Referenced Data (Spatial Processes)
5 Space-Time Models
6 Point Pattern Data
7 Focused Clustering Models
8 Conclusion
References
Spatial Models Using Laplace Approximation Methods
1 Introduction
2 Integrated Nested Laplace Approximation
2.1 Gaussian Markov Random Field
2.2 Priors
2.3 Model Criticism and Selection
2.4 Implementation
2.5 Other Features
3 Spatial Models
3.1 Geoadditive Mixed Effects Models
3.2 Disease Mapping
3.2.1 Geostatistical Models
3.2.2 Point Process Models
3.2.3 Examples
3.2.4 Geostatistics
3.2.5 Lattice Data
3.2.6 Point Patterns
4 Conclusions
5 Cross-References
References
Spatial Data and Spatial Statistics
1 Introduction
2 Analyzing Spatial Data: Four Challenges
2.1 Spatial Dependency
2.2 Spatial Heterogeneity
2.3 Data Sparsity
2.4 Uncertainty
3 Spatial Statistical Modeling
3.1 Spatial Statistical Modeling: The Big Picture
3.2 Spatial Statistical Modeling: Frequentist or Bayesian Inference?
4 Approaches to Modeling Spatial and Spatial-Temporal Data
4.1 Bayesian Spatial Econometric Models
4.2 Bayesian Spatial Hierarchical Models
4.3 Comparing Spatial Econometric and Hierarchical Modeling Approaches
5 Conclusion
6 Cross-References
References
Multivariate Spatial Process Models
1 Introduction
2 Classical Multivariate Geostatistics
2.1 Co-kriging
3 Some Theory for Cross-Covariance Functions
4 Separable Models
5 Co-regionalization
5.1 Nested Models
5.2 Conditional Development of the Linear Models of Co-Regionalization
5.3 A Spatially Varying Linear Model of Co-Regionalization
6 Other Constructive Approaches
6.1 Kernel Convolution Methods
6.2 Convolution of Covariance Functions Approaches
6.3 A Latent Dimension Approach
6.4 Matérn Cross-Covariance Functions
7 Multivariate Spatial Regression Models
7.1 Modeling with Co-regionalization
7.2 Generalized Linear Model Setting
7.3 Spatially Varying Coefficients Models
7.3.1 Details for a Single Covariate
8 An Example
9 Conclusions
10 Cross-References
References
Spatial Dynamics and Space-Time Data Analysis
1 Introduction
2 Spatial Dynamics in Regional Science
3 Exploratory Space-Time Data Analysis
3.1 ETSDA Methods
3.2 ESTDA Methods
4 Conclusion
References
Spatial Clustering and Autocorrelation of Health Events
1 Introduction
2 What Do We Want?
3 What Data Do We Have?
4 What Tools Do We Have and What Do They Find?
4.1 Indices of Spatial Autocorrelation
4.2 Classification Methods
4.3 Cluster Detection in Point-Level Data: Scan Statistics and Intensity Estimates
4.4 Cluster Detection for Area-Level Data
4.5 Other Approaches
5 What Do We Do with a Cluster?
6 Conclusion
7 Cross-References
References
Section XI: Spatial Econometrics
Maximum Likelihood Estimation
1 Introduction
2 Likelihoods
2.1 Likelihood with Continuous y and Normal Disturbances
2.2 Likelihood with Binomial y and Normal Disturbances
3 Inference and Estimation
3.1 Simplest Approach to Likelihood Estimation and Inference
3.2 Variance-Covariance Matrix Approach to Likelihood Inference
4 Spatial Error Model Example
5 Computational Details
5.1 Concentrated Log-Likelihood
5.2 Sparsity
5.3 Log-Determinant Calculations
6 Conclusions
References
Bayesian Markov Chain Monte Carlo Estimation
1 Introduction
2 Spatial Regression and Prior Modeling
3 Bayesian Inference via MCMC
3.1 A Brief Review of MCMC Theory
3.2 Stationary Distributions and a Central Limit Theorem for MCMC
4 MCMC Algorithms
4.1 Gibbs Sampling
4.2 Metropolis-Hastings
4.2.1 The Metropolis Algorithm
4.2.2 Metropolis-Hasting Algorithm
4.3 Choice of Proposal Distribution
5 Practical Considerations
5.1 Setting Up and Monitoring MCMC Chains
5.2 Other Tools and Post-Sampling Inference
6 MCMC Inference for the Spatial Durbin Model with Marginal Augmentation
7 Spatiotemporal Model
7.1 Empirical Application
7.2 Estimation Results
8 Conclusion
References
Instrumental Variables/Method of Moments Estimation
1 Introduction
2 A Primer on GMM Estimation
2.1 Model Specification and Moment Conditions
2.2 One-Step GMM Estimation
2.3 Two-Step GMM Estimation
3 GMM Estimation of Models with Spatial Lags
3.1 GMM Estimation of Spatial-Autoregressive Parameter
3.2 GMM Estimation of Regression Parameters
3.3 Guide to Literature
3.4 Exemplary GMM Estimators
4 GMM Estimation of Models with Spatial Mixing
5 Conclusion
References
Cross-Section Spatial Regression Models
1 Introduction
2 Spatial Effects in Cross-Sectional Models
2.1 Forms of Spatial Autocorrelation in Regression Models
2.2 Spatial Lag Model
2.2.1 Multiplier and Diffusion Effects
2.2.2 Interpretation of Coefficient Estimates
2.2.3 Variance-Covariance Matrix
2.2.4 Endogenous Spatial Lag and Heteroskedasticity
2.3 Cross-Regressive Model: Lagged Exogenous Variable
2.4 Models with Spatial Error Autocorrelation
2.4.1 Spatial Autoregressive Process
Spatial Diffusion
Variance-Covariance Matrix
Constrained Spatial Durbin Model
2.4.2 Spatial Moving Average Process
2.4.3 Kelejian and Robinson Specification
2.4.4 Direct Representation and Non-Parametric Specifications
2.5 Encompassing Specifications
2.6 Higher-Order Spatial Models
2.7 Heteroskedasticity
2.8 Parameter Instability
3 Specification Tests in Spatial Cross-Sectional Models
3.1 Moran´s I Test
3.2 Tests of a Single Assumption
3.2.1 Spatial Error Autocorrelation
3.2.2 Kelejian-Robinson Specification
3.2.3 Common Factor Test
3.2.4 Test of an Endogenous Spatial Lag
3.3 Tests in Presence of Spatial Autocorrelation or Spatial Lag
3.3.1 Joint Test
3.3.2 Conditional Tests
3.3.3 Robust Tests
3.4 Specification Search Strategies
3.5 Non-Nested Tests
3.6 Spatial Autocorrelation and Spatial Heterogeneity
3.6.1 Spatial Autocorrelation and Heteroskedasticity
3.6.2 Spatial Autocorrelation and Parameter Instability
4 Conclusion
References
Spatial Panel Models and Common Factors
1 Introduction
2 Linear Spatial Dependence Models for Cross-Section Data
3 Linear Spatial Dependence Models for Panel Data
4 Dynamic Linear Spatial Dependence Models for Panel Data
5 Direct, Indirect, and Spatial Spillover Effects
6 Empirical Illustration
7 Conclusion
8 Cross-References
References
Limited and Censored Dependent Variable Models
1 Introduction
2 Limited and Censored Variable Models
2.1 Models for Discrete Responses
2.2 Models for Censored and Truncated Data
3 Models Incorporating Spatial Effects
3.1 Geographically Weighted Regression
3.2 Spatial Filtering
3.3 Spatial Regression
4 Estimation Approaches
4.1 Maximum Simulated Likelihood Estimation (MSLE)
4.2 Composite Marginal Likelihood
4.3 Bayesian Approach
5 Conclusions
References
Spatial Econometric OD-Flow Models
1 Introduction to Gravity or Spatial Interaction Models
2 Gravity or Spatial Interaction Models Based on Independence
3 Spatial Autoregressive Interaction Models
4 Problems that Arise in Applied Modeling of Flows
5 Interpreting Spatial Interaction Models
5.1 A Numerical Illustration for the Nonspatial Gravity Model
5.2 A Numerical Illustration for the Spatial Gravity Model
6 Conclusion
References
Interpreting Spatial Econometric Models
1 Introduction
2 Spatial Regression Models
2.1 Spatial Error Models
2.2 Spatial Lag of X Models
2.3 Spatial Lag of y Models
2.4 Measures of Dispersion for the Effects Estimates
2.5 Partitioning Global Effects Estimates Over Space
3 Applications of Spatial Regression Models
3.1 Spatial Error Models
3.2 SLX and SDEM Models
3.3 SAR and SDM Models
4 Conclusion
References
Heterogeneous Coefficient Spatial Regression Panel Models
1 Introduction
2 Independent Heterogeneous Coefficient Models
3 Spatial Autoregressive Processes and Dependence
4 Details Regarding the Heterogeneous Coefficient Spatial Autoregressive Model
5 Markov Chain Monte Carlo Estimation of the Heterogeneous Coefficients Spatial Autoregressive Panel Model
5.1 Bayesian Prior Information
5.2 A Matrix Exponential Spatial Specification
6 Interpreting Model Estimates
7 Applications of the Heterogeneous Coefficient Spatial Regression Panel Models
7.1 Gas Station Pricing Application
7.2 Regional Wage Curve Application
7.3 Knowledge Production Function
8 Conclusions
9 Cross-References
References
Endogeneity in Spatial Models
1 Introduction
2 Impact of Spatial Autocorrelation on Estimation and Testing for Endogeneity
3 Estimation of Spatial Models with Endogeneity
3.1 Instrumental Variables and GMM
3.1.1 Single-Equation Models
3.1.2 Multiple Equation Models
3.2 Models for Panel Data
3.3 Other Identification Strategies
4 Conclusion
5 Cross-References
References
Using Convex Combinations of Spatial Weights in Spatial Autoregressive Models
1 Introduction
1.1 Multiple Connectivity Matrices
1.2 Convex Combinations of Multiple Weight Matrices
1.3 Bayesian Estimation of Convex Combinations of Weight Matrix Models
2 Computational Issues in Estimating Convex Combination of Weights Models
2.1 MCMC Sampling of the Parameters Γ
2.2 An Approximation to the Log-Determinant
2.3 Efficient Calculation of Partial Derivatives for the Model
3 Bayesian Model Averaging for Multiple Models
4 Conclusion
5 Cross References
References
Author Index
Subject Index

Citation preview

Manfred M. Fischer Peter Nijkamp Editors

Handbook of Regional Science Second and Extended Edition

Handbook of Regional Science

Manfred M. Fischer • Peter Nijkamp Editors

Handbook of Regional Science Second and Extended Edition

With 238 Figures and 78 Tables

Editors Manfred M. Fischer Vienna University of Economics and Business Vienna, Austria

Peter Nijkamp Free University Faculty of Economics Amsterdam, The Netherlands

ISBN 978-3-662-60722-0 ISBN 978-3-662-60723-7 (eBook) ISBN 978-3-662-60724-4 (print and electronic bundle) https://doi.org/10.1007/978-3-662-60723-7 1st edition: © Springer-Verlag Berlin Heidelberg 2014 2nd edition: © Springer-Verlag GmbH Germany, part of Springer Nature 2021 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer-Verlag GmbH, DE part of Springer Nature. The registered company address is: Heidelberger Platz 3, 14197 Berlin, Germany

Contents

Volume 1 Section I History of Regional Science . . . . . . . . . . . . . . . . . . . . . . . .

1

Regional Science Methods in Planning . . . . . . . . . . . . . . . . . . . . . . . . . . Peter Batey

3

A Reconnaissance Through the History of Shift-Share Analysis . . . . . . Michael L. Lahr and João Pedro Ferreira

25

City-Size Distribution: The Evolution of Theory, Evidence, and Policy in Regional Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gordon F. Mulligan and John I. Carruthers

41

Classical Contributions: Von Thünen and Weber . . . . . . . . . . . . . . . . . Roberta Capello

61

Market Areas and Competing Firms: History in Perspective . . . . . . . . Folke Snickars

81

Spatial Interaction Models: A Broad Historical Perspective . . . . . . . . . Johannes Bröcker

99

Regional Science Methods in Business . . . . . . . . . . . . . . . . . . . . . . . . . . Graham Clarke

125

Worlds Lost and Found: Regional Science Contributions in Support of Collective Decision-Making for Collective Action . . . . . . . . Kieran P. Donaghy

141

Section II

165

Location and Interaction . . . . . . . . . . . . . . . . . . . . . . . . .

Travel Behavior and Travel Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . Kenneth Button

167

.....................................

187

Activity-Based Analysis Harvey J. Miller

V

VI

Contents

Social Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Richard Medina and Nigel Waters

209

Land-Use Transport Interaction Models . . . . . . . . . . . . . . . . . . . . . . . . Michael Wegener

229

...............

247

Supply Chains and Transportation Networks . . . . . . . . . . . . . . . . . . . . Anna Nagurney

277

Network Equilibrium Models for Urban Transport David Boyce

Computable Models of Dynamic Spatial Oligopoly from the Perspective of Differential Variational Inequalities . . . . . . . . . . . . . . . . Terry L. Friesz and Amir H. Meimand

301

Complexity and Spatial Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aura Reggiani

325

Factor Mobility and Migration Models . . . . . . . . . . . . . . . . . . . . . . . . . Johannes Bröcker and Timo Mitze

349

Interregional Trade: Models and Analyses . . . . . . . . . . . . . . . . . . . . . . Geoffrey J. D. Hewings and Jan Oosterhaven

373

Interregional Input-Output Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jan Oosterhaven and Geoffrey J. D. Hewings

397

Section III

425

Regional Housing and Labor Markets . . . . . . . . . . . . . .

Real Estate and Housing Markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dionysia Lambiri and Antonios Rovolis

427

Housing Choice, Residential Mobility, and Hedonic Approaches . . . . . David M. Brasington

449

Migration and Labor Market Opportunities . . . . . . . . . . . . . . . . . . . . . Michael J. Greenwood

467

Job Search Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alessandra Faggian

481

......................

497

Labor Market Theory and Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stephan J. Goetz

515

.........................

539

Commuting, Housing, and Labor Markets Jan Rouwendal

Spatial Equilibrium in Labor Markets Philip E. Graves

Contents

VII

Regional Employment and Unemployment . . . . . . . . . . . . . . . . . . . . . . Francesca Mameli, Vassilis Tselios, and Andrés Rodríguez-Pose

557

Spatial Mismatch, Poverty, and Vulnerable Populations . . . . . . . . . . . . Laurent Gobillon and Harris Selod

573

Section IV

589

Regional Economic Growth . . . . . . . . . . . . . . . . . . . . . .

Neoclassical Regional Growth Models . . . . . . . . . . . . . . . . . . . . . . . . . . Maria Abreu

591

Endogenous Growth Theory and Regional Extensions . . . . . . . . . . . . . Zoltan Acs and Mark Sanders

615

Incorporating Space in the Theory of Endogenous Growth: Contributions from the New Economic Geography . . . . . . . . . . . . . . . . Steven Bond-Smith and Philip McCann

635

Demand-Driven Theories and Models of Regional Growth . . . . . . . . . . William Cochrane and Jacques Poot

661

....................

679

The Rise of Skills: Human Capital, the Creative Class, and Regional Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Charlotta Mellander and Richard Florida

707

Regional Growth and Convergence Empirics Julie Le Gallo and Bernard Fingleton

Local Social Capital and Regional Development . . . . . . . . . . . . . . . . . . Hans Westlund and Johan P. Larsson

721

.............

737

Infrastructure and Regional Economic Growth . . . . . . . . . . . . . . . . . . . Arthur Grimes

755

Wellbeing and the Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Philip S. Morrison

779

...........

799

Population Diversity and Regional Economic Growth Annekatrin Niebuhr and Jan Cornelius Peters

Spatial Policy for Growth and Equity: A Forward Look Sandy Dall’erba and Irving Llamosas-Rosas

Volume 2 Section V

Innovation and Regional Economic Development . . . .

The Geography of Innovation Edward J. Malecki

................................

817 819

VIII

Contents

Systems of Innovation and the Learning Region . . . . . . . . . . . . . . . . . . Philip Cooke

835

Networks in the Innovation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . Emmanouil Tranos

853

The Geography of R&D Collaboration Networks . . . . . . . . . . . . . . . . . Thomas Scherngell

869

Generation and Diffusion of Innovation . . . . . . . . . . . . . . . . . . . . . . . . . Börje Johansson

889

Cities, Knowledge, and Innovation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frank G. van Oort and Jan G. Lambooy

913

Knowledge Flows, Knowledge Externalities, and Regional Economic Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Charlie Karlsson and Urban Gråsjö

929

Knowledge Spillovers Informed by Network Theory and Social Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maryann P. Feldman and Scott W. Langford

957

.................

971

Clusters, Local Districts, and Innovative Milieux Michaela Trippl and Edward M. Bergman Section VI

Regional Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The Case for Regional Policy: Design and Evaluation . . . . . . . . . . . . . . Carlos R. Azzoni and Eduardo A. Haddad

991 993

Place-Based Policies: Principles and Developing Country Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1009 Gilles Duranton and Anthony J. Venables New Trends in Regional Policy: Place-Based Component and Structural Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1031 José-Enrique Garcilazo and Joaquim Oliveira Martins European Regional Policy: What Can Be Learned . . . . . . . . . . . . . . . . 1053 Juan R. Cuadrado-Roura Regional Policies in East-Central Europe Grzegorz Gorzelak

. . . . . . . . . . . . . . . . . . . . . . . 1087

Special Economic Zones and the Political Economy of Place-Based Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1115 Christopher A. Hartwell

Contents

IX

Rural Development Policies at Stake: Structural Changes and Target Evolutions During the Last 50 Years . . . . . . . . . . . . . . . . . . . . . 1137 André Torre and Frederic Wallet Section VII New Economic Geography, Evolutionary Economic Geography, and Regional Dynamics . . . . . . . . . . . . . . . . .

1163

Schools of Thought on Economic Geography, Institutions, and Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1165 Philip McCann New Economic Geography and the City Carl Gaigné and Jacques-Franc¸ois Thisse

. . . . . . . . . . . . . . . . . . . . . . . . 1179

New Economic Geography After 30 Years . . . . . . . . . . . . . . . . . . . . . . . 1223 Steven Brakman, Harry Garretsen, and Charles van Marrewijk Relational and Evolutionary Economic Geography . . . . . . . . . . . . . . . . 1247 Harald Bathelt and Pengfei Li Path Dependence and the Spatial Economy: Putting History in Its Place . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1265 Ron Martin Agglomeration and Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1289 Gilles Duranton Changes in Economic Geography Theory and the Dynamics of Technological Change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1307 Riccardo Crescenzi Geographical Economics and Policy Henry G. Overman

. . . . . . . . . . . . . . . . . . . . . . . . . . . 1325

New Geography Location Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1341 Alan T. Murray Geography and Entrepreneurship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1361 Martin Andersson and Johan P. Larsson The New Economic Geography in the Context of Migration . . . . . . . . . 1375 K. Bruce Newbold Environmental Conditions and New Geography: The Case of Climate Change Impact on Agriculture and Transportation . . . . . . . . . 1385 Sandy Dall’erba and Zhenhua Chen Social Capital, Innovativeness, and the New Economic Geography . . . 1397 Charlie Karlsson and Hans Westlund

X

Section VIII

Contents

Environment and Natural Resources . . . . . . . . . . . . . .

1413

Spatial Environmental and Natural Resource Economics . . . . . . . . . . . 1415 Amy W. Ando and Kathy Baylis Dynamic and Stochastic Analysis of Environmental and Natural Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1435 Yacov Tsur and Amos Zemel Game Theoretic Modeling in Environmental and Resource Economics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1455 Hassan Benchekroun and Ngo Van Long Methods of Environmental Valuation . . . . . . . . . . . . . . . . . . . . . . . . . . 1477 John Loomis, Christopher Huber, and Leslie Richardson The Hedonic Method for Valuing Environmental Policies and Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1503 Philip E. Graves Materials Balance Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1519 Gara Villalba Méndez and Laura Talens Peiró Climate Change and Regional Impacts . . . . . . . . . . . . . . . . . . . . . . . . . 1539 Daria A. Karetnikov and Matthias Ruth Urban and Regional Sustainability Emily Talen

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1561

Population and the Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1575 Jill L. Findeis and Shadayen Pervez

Volume 3 Section IX

Spatial Analysis and Geocomputation

.............

1595

Geographic Information Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1597 Michael F. Goodchild and Paul A. Longley Geospatial Analysis and Geocomputation: Concepts and Modeling Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1615 Michael de Smith Bayesian Spatial Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1629 Chris Brunsdon Geovisualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1651 Ross Maciejewski

Contents

XI

Web-Based Tools for Exploratory Spatial Data Analysis Linda See

. . . . . . . . . . . 1671

Spatio-temporal Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1691 Tao Cheng, James Haworth, Berk Anbaroglu, Garavig Tanaksaranond, and Jiaqiu Wang Scale, Aggregation, and the Modifiable Areal Unit Problem . . . . . . . . . 1711 David Manley Spatial Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1727 Clio Andris and David O’Sullivan Cellular Automata and Agent-Based Models . . . . . . . . . . . . . . . . . . . . . 1751 Keith C. Clarke Spatial Microsimulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1767 Alison J. Heppenstall and Dianna M. Smith Fuzzy Modeling in Spatial Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1785 Yee Leung Section X

Spatial Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Geostatistical Models and Spatial Interpolation Peter M. Atkinson and Christopher D. Lloyd

1811

. . . . . . . . . . . . . . . . . . 1813

Spatial Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1829 Eric M. Delmelle Exploratory Spatial Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1845 Jürgen Symanzik Spatial Autocorrelation and Moran Eigenvector Spatial Filtering . . . . 1863 Daniel Griffith and Yongwan Chun Geographically Weighted Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 1895 David C. Wheeler Bayesian Modeling of Spatial Data Peter Congdon

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1923

Spatial Models Using Laplace Approximation Methods . . . . . . . . . . . . 1943 Virgilio Gómez-Rubio, Roger S. Bivand, and Håvard Rue Spatial Data and Spatial Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1961 Robert Haining and Guangquan Li Multivariate Spatial Process Models Alan E. Gelfand

. . . . . . . . . . . . . . . . . . . . . . . . . . . 1985

XII

Contents

Spatial Dynamics and Space-Time Data Analysis . . . . . . . . . . . . . . . . . 2017 Sergio J. Rey Spatial Clustering and Autocorrelation of Health Events . . . . . . . . . . . 2035 Lance A. Waller Section XI

Spatial Econometrics

...........................

2053

Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2055 R. Kelley Pace Bayesian Markov Chain Monte Carlo Estimation . . . . . . . . . . . . . . . . . 2073 Jeffrey A. Mills and Olivier Parent Instrumental Variables/Method of Moments Estimation . . . . . . . . . . . . 2097 Ingmar R. Prucha Cross-Section Spatial Regression Models . . . . . . . . . . . . . . . . . . . . . . . . 2117 Julie Le Gallo Spatial Panel Models and Common Factors J. Paul Elhorst

. . . . . . . . . . . . . . . . . . . . . 2141

Limited and Censored Dependent Variable Models . . . . . . . . . . . . . . . . 2161 Xiaokun (Cara) Wang Spatial Econometric OD-Flow Models . . . . . . . . . . . . . . . . . . . . . . . . . . 2179 Christine Thomas-Agnan and James P. LeSage Interpreting Spatial Econometric Models James P. LeSage and R. Kelley Pace

. . . . . . . . . . . . . . . . . . . . . . . 2201

Heterogeneous Coefficient Spatial Regression Panel Models . . . . . . . . . 2219 Yao-Yu Chih and James P. LeSage Endogeneity in Spatial Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2237 Julie Le Gallo and Bernard Fingleton Using Convex Combinations of Spatial Weights in Spatial Autoregressive Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2267 Nicolas Debarsy and James P. LeSage Author Index

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2283

Subject Index

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2309

About the Editors

Manfred M. Fischer is Professor of Economic Geography at Vienna University of Economics and Business and Adjunct Professor at the Chinese Academy of Sciences. He was Co-founder of the prominent interdisciplinary journal Geographical Systems (Gordon & Breach) and is now Editor-in-Chief of its successor, the Journal of Geographical Systems (Springer). Professor Fischer also co-founded the Springer book series Advances in Spatial Science and served for eight years as Chair of the IGU Commission on Mathematical Models. In this role he was responsible for organizing several of the Commission’s highly successful international meetings around the world, including symposia in Shanghai and Beijing (1990). For more than 40 years he has consistently made significant contributions to regional science in general, and spatial data analysis, geocomputation, and spatial econometrics in particular. Professor Fischer has published widely, both books and articles, in dozens of distinct academic journals. Some of his books can be read in Chinese. On account of his scholarly achievements, he has been named a Fellow of the Regional Science Association International, the Royal Dutch Academy of Sciences, the Austrian Academy of Sciences, and the International Academy of Sciences for Europe and Asia. Moreover, Professor Fischer has received the highest accolades of the RSAI: the Jean Paelinck Award 2015 for outstanding scholarly achievements in the field of Regional Science Methods, and the Founder’s Medal 2016 that is awarded every four years for significant lifetime contributions. Peter Nijkamp is Professor of Regional and Urban Economics and Economic Geography at VU University and affiliated to the Jheronimus Academy of Data Science (JADS); ‘s-Hertogenbosch (the Netherlands); the Royal Institute of Technology (KTH), Stockholm (Sweden); and A. Mickiewicz University, Poznan (Poland). He serves on the editorial/advisory boards of more than 30 journals. According to the RePec list, Professor Nijkamp is one of the top 30 most wellknown economists in the world. His main research interests cover quantitative plan evaluation, regional and urban modelling, multicriteria analysis, transport systems analysis, mathematical systems modelling, technological innovation, entrepreneurship, environmental and resource management, and sustainable development. In the past years, he has focused his research in particular on new quantitative methods for XIII

XIV

About the Editors

policy analysis as well as on spatial-behavioral analysis of economic agents. Professor Nijkamp has broad expertise in the area of public policy, services planning, infrastructure management, and environmental protection. In all these fields he has published many books and numerous articles. Professor Nijkamp has been a visiting professor in many universities all over the world. He is Past President of the European Regional Science Association and of the Regional Science Association International. Professor Nijkamp is also Fellow of the Royal Netherlands Academy of Sciences and its former Vice President. From 2002 to 2009, he served as President of the Governing Board of the Netherlands Research Council (NWO). In addition, he is Past President of the European Heads of Research Councils (EUROHORCs). Professor Nijkamp is also Fellow of the Academia Europaea and member of many international scientific organizations. He has acted regularly as advisor to (inter)national bodies and (local and national) governments. In 1996, he was awarded the most prestigious scientific prize in the Netherlands, the Spinoza Award.

About the Section Editors

History of Regional Science Prof. Peter Batey Department of Geography and Planning, University of Liverpool, Liverpool, UK Location and Interaction Prof. Piet Rietveld (†) Free University Amsterdam, Amsterdam, The Netherlands Prof. Manfred M. Fischer Vienna University of Economics and Business, Vienna, Austria Regional Housing and Labor Markets Prof. Alessandra Faggian Social Sciences, Gran Sasso Science Institute, L’Aquila, Italy Dr. Marco Modica Gran Sasso Science Institute, L’Aquila, Italy Regional Economic Growth Prof. Jacques Poot Department of Spatial Economics, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands Innovation and Regional Economic Development Prof. Roberta Capello Department Architecture, Built Environment and Construction Engineering A.B.C., Politecnico di Milano, Milan, Italy Regional Policy in Emerging Markets Prof. Juan R. Cuadrado-Roura Department of Economics and Business Administration, Faculty of Economics and Business, University of Alcalá, Alcalá de Henares, Madrid, Spain New Economic Geography, Evolutionary Economic Geography, and Regional Dynamics Prof. Andrés Rodríguez-Pose Department of Geography and Environment, London School of Economics, London, UK XV

XVI

About the Section Editors

Dr. Karima Kourtit Open University, Heerlen, The Netherlands Environment and Natural Resources Dr. Amitrajeet A. Batabayal Rochester Institute of Technology, Rochester, NY, USA Spatial Analysis and Geocomputation Prof. Paul A. Longley Consumer Data Research Centre and Department of Geography, University College London, London, UK Prof. Manfred M. Fischer Vienna University of Economics and Business, Vienna, Austria Spatial Statistics Prof. Peter Congdon School of Geography, Queen Mary University of London, London, UK Spatial Econometrics Prof. James P. LeSage Department of Finance and Economics, McCoy College of Business Administration, Texas State University, San Marcos, TX, USA

Contributors

Maria Abreu University of Cambridge, Cambridge, UK Zoltan Acs Schar School of Policy and Government, George Mason University, Fairfax, VA, USA Berk Anbaroglu Department of Geomatics Engineering, Hacettepe University, Ankara, Turkey Martin Andersson Department of Industrial Economics, Blekinge Institute of Technology, Karlskrona, Sweden Amy W. Ando Department of Agricultural and Consumer Economics, University of Illinois at Urbana-Champaign, Urbana, IL, USA Clio Andris Department of Geography, The Pennsylvania State University, University Park, PA, USA Peter M. Atkinson Lancaster Environment Centre, University of Lancaster, Lancaster, UK Carlos R. Azzoni Department of Economics, University of Sao Paulo, Sao Paulo, Brazil Peter Batey Department of Geography and Planning, University of Liverpool, Liverpool, UK Harald Bathelt Department of Political Science and Department of Geography and Planning, University of Toronto, Toronto, ON, Canada Kathy Baylis Department of Agricultural and Consumer Economics, University of Illinois at Urbana-Champaign, Urbana, IL, USA Hassan Benchekroun Department of Economics and CIREQ, McGill University, Montréal, QC, Canada Edward M. Bergman Institute for Multi-Level Governance and Development, Vienna University of Economics and Business, Vienna, Austria XVII

XVIII

Contributors

Roger S. Bivand Department of Economics, Norwegian School of Economics, Bergen, Norway Steven Bond-Smith Curtin Business School, Bankwest Curtin Economics Centre, Curtin University, Bentley, WA, Australia David Boyce Department of Civil and Environmental Engineering, Northwestern University, Evanston, IL, USA Steven Brakman Faculty of Economics and Business, University of Groningen, Groningen, The Netherlands David M. Brasington Department of Economics, University of Cincinnati, Cincinnati, OH, USA Johannes Bröcker Faculty of Business, Economics and Social Sciences, Institute for Environmental, Resource and Regional Economics, CAU Kiel University, Kiel, Germany Chris Brunsdon National Centre for Geocomputation, National University of Ireland Maynooth, Maynooth, Ireland Kenneth Button Schar School of Policy and Government, George Mason University, Arlington, VA, USA Roberta Capello Department Architecture, Built Environment and Construction Engineering A.B.C., Politecnico di Milano, Milan, Italy John I. Carruthers Department of City and Regional Planning, Cornell University, Ithaca, NY, USA Zhenhua Chen Department of City and Regional Planning, The Ohio State University, Columbus, OH, USA Tao Cheng SpaceTimeLab, Department of Civil, Environmental and Geomatic Engineering, University College, London, UK Yao-Yu Chih Department of Finance and Economics, McCoy College of Business Administration, Texas State University, San Marcos, TX, USA Yongwan Chun Geospatial Information Sciences, School of Economic, Political and Policy Sciences, University of Texas at Dallas, Richardson, TX, USA Graham Clarke School of Geography, University of Leeds, Leeds, UK Keith C. Clarke Department of Geography, University of California, Santa Barbara, Santa Barbara, CA, USA William Cochrane School of Social Sciences, University of Waikato, Hamilton, New Zealand

Contributors

XIX

Peter Congdon School of Geography, Queen Mary University of London, London, UK Philip Cooke Centre for Advanced Studies, Cardiff University, Cardiff, UK Riccardo Crescenzi Department of Geography and Environment, London School of Economics, London, UK Juan R. Cuadrado-Roura Department of Economics and Business Administration, Faculty of Economics and Business, University of Alcalá, Alcalá de Henares, Madrid, Spain Sandy Dall’erba Department of Agricultural and Consumer Economics and Regional Economics Applications Laboratory, University of Illinois at UrbanaChampaign, Urbana, IL, USA Michael de Smith Department of Geography, University College London, London, UK Nicolas Debarsy CNRS- LEM UMR 9221, Université de Lille 1, Lille, France Eric M. Delmelle Department of Geography and Earth Sciences, University of North Carolina at Charlotte, Charlotte, NC, USA Kieran P. Donaghy Department of City and Regional Planning, Cornell University, Ithaca, NY, USA Gilles Duranton The Wharton School, University of Pennsylvania, Philadelphia, PA, USA J. Paul Elhorst Department of Economics, Econometrics and Finance, University of Groningen, Groningen, The Netherlands Alessandra Faggian Social Sciences, Gran Sasso Science Institute, L’Aquila, Italy Maryann P. Feldman Department of Public Policy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA João Pedro Ferreira Edward J Bloustein School of Planning and Public Policy, Rutgers, the State University of New Jersey, New Brunswick, NJ, USA Jill L. Findeis Division of Applied Social Sciences, University of Missouri-Columbia, Population Research Institute, Pennsylvania State University, Columbia, MO, USA Bernard Fingleton Department of Land Economy, University of Cambridge, Cambridge, UK Richard Florida Rotman School of Management, University of Toronto, Toronto, ON, Canada

XX

Contributors

Terry L. Friesz Department of Industrial and Manufacturing Engineering, Pennsylvania State University, University Park, PA, USA Carl Gaigné INRA, UMR1302, SMART-LERECO, Rennes, France CREATE, Laval University, Québec, QC, Canada José-Enrique Garcilazo Centre for Entrepreneurship, SMEs, Regions and Cities, OECD, Paris, France Harry Garretsen Faculty of Economics and Business, University of Groningen, Groningen, The Netherlands Alan E. Gelfand Department of Statistical Science, Duke University, Durham, NC, USA Laurent Gobillon Paris School of Economics – CNRS, Paris, France Stephan J. Goetz Northeast Regional Center for Rural Development and Department of Agricultural Economics, Sociology and Education, Pennsylvania State University, University Park, PA, USA Virgilio Gómez-Rubio Department of Mathematics, School of Industrial Engineering-Albacete, University of Castilla-La Mancha, Albacete, Spain Michael F. Goodchild Center for Spatial Studies and Department of Geography, University of California, Santa Barbara, CA, USA Grzegorz Gorzelak Centre for European Regional (EUROREG), University of Warsaw, Warsaw, Poland

and

Local

Studies

Urban Gråsjö School of Business, Economics and IT, University West, Trollhättan, Sweden Philip E. Graves Department of Economics, University of Colorado, Boulder, CO, USA Michael J. Greenwood Department of Economics, University of Colorado, Boulder, CO, USA Daniel Griffith Geospatial Information Sciences, School of Economic, Political and Policy Sciences, University of Texas at Dallas, Richardson, TX, USA Arthur Grimes Motu Economic and Public Policy Research, Wellington, New Zealand Eduardo A. Haddad Department of Economics, University of Sao Paulo, Sao Paulo, Brazil Robert Haining Department of Geography, University of Cambridge, Cambridge, UK Christopher A. Hartwell Department of Accounting, Finance and Economics, Bournemouth University, Bournemouth, UK

Contributors

XXI

James Haworth SpaceTimeLab, Department of Civil, Environmental and Geomatic Engineering, University College, London, UK Alison J. Heppenstall School of Geography, University of Leeds, Leeds, UK Geoffrey J. D. Hewings Regional Economics Applications Laboratory, University of Illinois, Urbana-Champaign, IL, USA Christopher Huber U.S. Geological Survey, Fort Collins Science Center, Fort Collins, CO, USA Börje Johansson Department of Economics, Jönköping International Business School (JIBS), Jönköping, Sweden Daria A. Karetnikov University of Maryland, College Park, MD, USA Charlie Karlsson Economics, Jönköping, Sweden

Jönköping

International

Business

School,

R. Kelley Pace Department of Finance, E.J. Ourso College of Business Administration, Louisiana State University, Baton Rouge, LA, USA Michael L. Lahr Edward J Bloustein School of Planning and Public Policy, Rutgers, the State University of New Jersey, New Brunswick, NJ, USA Dionysia Lambiri Geography and Environmental Science, University of Southampton Highfield, Southampton, UK Jan G. Lambooy Department of Economic Geography, Faculty of Geosciences, Utrecht University, Utrecht, The Netherlands Scott W. Langford Department of Public Policy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA Johan P. Larsson Department of Land Economy, University of Cambridge, Cambridge, UK Julie Le Gallo CESAER UMR 1041, AgroSup Dijon, INRA, Université de Bourgogne Franche-Comté, Dijon, France James P. LeSage Department of Finance and Economics, McCoy College of Business Administration, Texas State University, San Marcos, TX, USA Yee Leung Institute of Future Cities, Department of Geography and Resource Management, The Chinese University of Hong Kong, Hong Kong, People’s Republic of China Pengfei Li Department of International Business, HEC Montréal, Montréal, QC, Canada Guangquan Li Department of Mathematics, Physics and Electrical Engineering, Northumbria University, Newcastle upon Tyne, UK

XXII

Contributors

Irving Llamosas-Rosas General Direction of Economic Research, Regional Delegation of Guadalajara, Bank of Mexico, Guadalajara, Mexico Christopher D. Lloyd School of Natural and Built Environment, Queen’s University Belfast, Belfast, UK Paul A. Longley Consumer Data Research Centre and Department of Geography, University College London, London, UK John Loomis Department of Agricultural and Resource Economics, Colorado State University, Fort Collins, CO, USA Ross Maciejewski Arizona State University, Tempe, AZ, USA Edward J. Malecki Department of Geography, Ohio State University, Columbus, OH, USA Francesca Mameli Dipartimento di Scienze Economiche e Aziendali and CRENoS, Università degli Studi di Sassari, Sassari, Italy David Manley School of Geographical Sciences, University of Bristol, Bristol, UK Ron Martin Department of Geography and St. Catharine's College, University of Cambridge, Cambridge, UK Philip McCann University of Sheffield Management School, Sheffield, UK Richard Medina Department of Geography, University of Utah, Salt Lake City, UT, USA Amir H. Meimand Zilliant, Austin, TX, USA Charlotta Mellander Jönköping International Business School, Jönköping, Sweden Harvey J. Miller Department of Geography, The Ohio State University, Columbus, OH, USA Jeffrey A. Mills Department of Economics, University of Cincinnati, Cincinnati, OH, USA Timo Mitze Department of Business and Economics, University of Southern Denmark, Sønderborg, Denmark Philip S. Morrison School of Geography, Environment and Earth Sciences, Victoria University of Wellington, Wellington, New Zealand Gordon F. Mulligan School of Geography and Development, University of Arizona, Tucson, AZ, USA Alan T. Murray Department of Geography, University of California at Santa Barbara, Santa Barbara, CA, USA

Contributors

XXIII

Anna Nagurney Department of Operations and Information Management, Isenberg School of Management, University of Massachusetts, Amherst, MA, USA K. Bruce Newbold School of Geography and Earth Sciences, McMaster University, Hamilton, ON, Canada Annekatrin Niebuhr IAB Nord, Regional Research Network of the Institute for Employment Research, Institute for Employment Research, Kiel, Germany Joaquim Oliveira Martins Centre for Entrepreneurship, SMEs, Regions and Cities, OECD, PSL-University of Paris Dauphine, Paris, France Jan Oosterhaven Faculty of Economics and Business, University of Groningen, Groningen, The Netherlands David O’Sullivan School of Geography, Environment and Earth Science, Victoria University of Wellington, Wellington, New Zealand Henry G. Overman Department of Geography and Environment, London School of Economics and Political Science, London, UK R. Kelley Pace Department of Finance, E. J. Ourso College of Business Administration, Louisiana State University, Baton Rouge, LA, USA Olivier Parent Department of Economics, University of Cincinnati, Cincinnati, OH, USA Shadayen Pervez Division of Applied Social Sciences, University of MissouriColumbia, Population Research Institute, Pennsylvania State University, Columbia, MO, USA Jan Cornelius Peters Institute of Rural Studies, Johann Heinrich von Thünen Institute, Braunschweig, Germany Jacques Poot Department of Spatial Economics, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands Ingmar R. Prucha Department of Economics, University of Maryland, College Park, MD, USA Aura Reggiani Department of Economics, University of Bologna, Bologna, Italy Sergio J. Rey Center for Geospatial Sciences, University of California, Riverside, CA, USA Leslie Richardson National Park Service, Social Science Program, Fort Collins, CO, USA Andrés Rodríguez-Pose Department of Geography and Environment, London School of Economics, London, UK Jan Rouwendal Department of Spatial Economics, VU University, Amsterdam, The Netherlands

XXIV

Contributors

Antonios Rovolis Department of Economic and Regional Development, Panteion University of Athens, Kallithea, Greece Håvard Rue CEMSE Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia Matthias Ruth University of Alberta, Edmonton, AB, Canada Mark Sanders Utrecht School of Economics, Utrecht University, Utrecht, The Netherlands Thomas Scherngell Center for Innovation Systems and Policy, AIT Austrian Institute of Technology, Vienna, Austria Linda See Ecosystem Services and Management Program, International Institute for Applied Systems Analysis (IIASA), Laxenburg, Austria Harris Selod The World Bank, Washington, DC, USA Dianna M. Smith University of Southampton, Southampton, UK Folke Snickars Department of Urban Planning and the Environment, KTH Royal Institute of Technology, Stockholm, Sweden Jürgen Symanzik Department of Mathematics and Statistics, Utah State University, Logan, UT, USA Emily Talen Social Sciences Division, University of Chicago, Chicago, IL, USA Laura Talens Peiró Social Innovation Centre, INSEAD, Fontainebleau, France Garavig Tanaksaranond Department of Survey Engineering, Chulalongkorn University, Bangkok, Thailand Jacques-Franc¸ois Thisse CORE, Université catholique de Louvain and CEPR, Louvain-la-Neuve, Belgium Christine Thomas-Agnan T.S.E.-R., Toulouse School of Economics, Toulouse, France André Torre INRA – AgroParistech, University Paris-Saclay, UMR SAD-APT, Paris, France Emmanouil Tranos School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston, UK Michaela Trippl Department of Geography and Regional Research, University of Vienna, Vienna, Austria Vassilis Tselios Planning and Regional Development, University of Thessaly, Volos, Greece Yacov Tsur Department of Environmental Economics and Management, The Hebrew University of Jerusalem, Rehovot, Israel

Contributors

XXV

Ngo Van Long Department of Economics and CIREQ, McGill University, Montréal, QC, Canada Charles van Marrewijk Utrecht University School of Economics, Utrecht, The Netherlands Frank G. van Oort Erasmus School of Economics, Erasmus University Rotterdam, Rotterdam, The Netherlands Anthony J. Venables Department of Economics, University of Oxford, Oxford, UK Gara Villalba Méndez Universitat Autònoma de Barcelona, Bellaterra, Spain Lance A. Waller Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA Frederic Wallet INRA – AgroParistech, University Paris-Saclay, UMR SAD-APT, Paris, France Jiaqiu Wang National Institute for Cardiovascular Outcomes Research, Barts NHS Trust, London, UK Xiaokun (Cara) Wang Department of Civil and Environmental Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA Nigel Waters Department of Geography, University of Calgary, Calgary, AB, Canada Michael Wegener Spiekermann and Wegener, Urban and Regional Research, Dortmund, Germany Hans Westlund Department of Urban Planning and Environment, KTH Royal Institute of Technology, Stockholm, Sweden David C. Wheeler Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA Amos Zemel Department of Solar Energy and Environmental Physics, Jacob Blaustein Institutes for Desert Research, Ben Gurion University of the Negev, Beersheba, Israel

Regional Science in Perspective: Editorial Introduction

Prologue This Handbook of Regional Science, with 110 chapters and more than 2,300 pages, offers a representative and kaleidoscopic overview of the field of contemporaneous regional science. Its great diversity is also mirrored in the disciplinary composition of the 170 contributing authors who appear to have their roots in different disciplines, such as (regional and urban) economics, economic and social geography, architecture and planning, political science and public administration, demography and sociology, mathematics, statistics and econometrics, ecology and environmental science, and so forth. It is noteworthy that the search for a uniform and generally accepted scientific paradigm in regional science, in terms of theory or methodology, would be in vain. Each of the constituent disciplines in regional science brings in its own disciplinary theoretical, conceptual, or methodological framework. Of course, such scientific approaches may sometimes be overlapping. The added value of regional science as a “portfolio” discipline is its bridging principle: combining different, but compatible – though often complementary – approaches in one research domain enhances our insights into the interdependent complexity of the space-economy. Regional science is not an archipelago consisting of different and disconnected islands. The integrating nature of regional science does not rest on one common scientific constellation, but on a joint use of appropriate research frames that enhance our understanding of spatial processes and structures, interdependencies, and governance issues on regions and cities on our planet. After a discussion of the aims and scope of this Handbook, we will briefly sketch the organization and the key substance of the eleven sections and the 110 constituent chapters of this multi-volume opus. In the final section of this introductory chapter we will attempt to identify some formal textual commonalities in the different chapters of this Handbook by seeking for various substantive concepts and terms commonly used across the chapters.

Aims and Scope The Handbook of Regional Science is a multi-volume reference work providing state-of-the-art knowledge on regional science, composed by renowned scientists in the field. It is part of the prestigious “Springer Reference Library” that appears not XXVII

XXVIII

Regional Science in Perspective: Editorial Introduction

only in book format but also online as fully searchable and hyperlinked eReferences on SpringerLink. An important feature of Springer References is that they represent mainly work of tertiary literature, containing digested and established knowledge presented in an accessible format. The level of each chapter is such that not only a graduate student or a postdoc researcher but also a more advanced regional scientist can benefit from a chapter that does not belong to his or her area of expertise. Each chapter is also a stand-alone entity that covers a particular topic without assuming that the reader will need to read another part of the same topic elsewhere in the reference work. This reference work addresses theory, methodology, application, and policy in one opus and is intended to serve the academic needs of graduate students as well as junior and senior scientists in regional science and related fields, with an interest in studying social science phenomena and processes that have an explicit spatial dimension. The work is specifically intended to function as an instructional guide for students and established scholars in regional science for ease of use and regular literature access, when needed.

Structure of the Handbook The material in this Handbook has been chosen to provide a systematic state-of-theart presentation of the diversity of current and emergent research topics, theories and models, methods and techniques, not available elsewhere, despite many excellent journals and book publications. The international group of authors was selected for their knowledge of a subject area and their ability to communicate basic information in their subject area succinctly and accessibly. This second edition of the Handbook of Regional Science has changed considerably since the first edition in 2014, which is most evident by the addition of two new sections, one on the history of regional science and one on regional policy. Most chapters have been updated to incorporate new developments, while others have been rewritten by different authors with fresh perspectives on the material concerned. There are approximately 30 new chapters, written by leading scholars in the field covering topics such as knowledge spillovers, R&D collaboration networks, regional policy assessments, big data issues, dynamic modeling approaches, web-based tools for exploratory spatial data analysis, fuzzy modeling, multivariate spatial process models, endogeneity in spatial models and heterogeneous spatial autoregressive models, among others. These additions reflect largely developments that have come to the fore or gained importance since the first edition. The reference work now contains 110 chapters and is subdivided into eleven sections that range from the history of regional science over regional housing and labor markets to spatial analysis and geocomputation, and finally to spatial statistics as well as spatial econometrics. The sections are categorized as follows, with the section editors along with the responsible editor-in-chief (EiC) given in parentheses:

Regional Science in Perspective: Editorial Introduction

XXIX

• History of Regional Science (Peter Batey; EiC: PN) • Location and Interaction (Piet Rietveld†, Manfred M. Fischer; EiC: MMF) • Regional Housing and Labor Markets (Alessandra Faggian, Marco Modica; EiC: MMF) • Regional Economic Growth (Jacques Poot; EiC: PN) • Innovation and Regional Economic Development (Roberta Capello; EiC: MMF) • Regional Policy (Juan Cuadrado-Roura; EiC: PN) • New Economic Geography, Evolutionary Economic Geography, and Regional Dynamics (Karima Kourtit, Andrés Rodríguez-Pose; EiC: PN) • Environmental and Natural Resources (Amitrajeet Batabayal; EiC: MMF) • Spatial Analysis and Geocomputation (Paul Longley, Manfred M. Fischer; EiC: MMF) • Spatial Statistics (Peter Congdon; EiC: MMF) • Spatial Econometrics (James LeSage; EiC: MMF) The various sections in this Handbook will now concisely be introduced.

History of Regional Science Regional science has over the past half a century turned into a broad multidisciplinary orientation on regional and urban issues, combining – and being a complement to – urban and regional economics, social and economic geography, transportation science, environmental science, political science, and planning theory. Regional science has also developed a powerful scientific toolbox that is nowadays being used in many disciplines all over the world. Regional science as a broadly recognized scientific domain has been brought to fruition only over the last 60 years. This does not imply that there was no interest in spatial issues in earlier times. On the contrary, already the grandfather of economics, Adam Smith, analyzed the relationship between location and trade, by emphasizing the importance of accessibility and spatial connectivity. And many other classical scholars, for example, Ricardo, Malthus, Quesnay and several others, have addressed – often implicitly – important issues of the space-economy. Of course, there is also a range of recognized predecessors of regional science, in particular, von Thünen, Weber, Palander, Predöhl, Christaller and Lösch. But the real history of regional science started with the seminal contributions of Walter Isard, who laid the foundation for a rigorous analytically oriented regional science since the mid-1950s. The framework developed by him had a theoretical foundation, a strong methodological orientation, and a strong emphasis on applied modeling of real-world phenomena and processes, seen from a multidisciplinary perspective. It is noteworthy that Isard did not only provide original contributions to regional science in a strict sense, but also to ecological science, transportation science, and even conflict management. His approach is a perfect example of the

XXX

Regional Science in Perspective: Editorial Introduction

multidisciplinary nature of regional science. This trans-disciplinary character is also the key feature of the present multi-volume reference work. Contributors of the various chapters originate from several disciplines which all together make up the constituents of regional science. These chapters follow the strict methodological requirements imposed in the early genesis of regional science, in which quantitative analysis and multidisciplinary approach are key features. The first section in the revised and updated version of the Handbook is a new section on the History of Regional Science and consists of eight chapters. As mentioned above, regional science has a respectable history which is so broad in scope and nature that it is not well possible to provide a comprehensive record of all achievements. And therefore, this section offers a selection of highlights in regional science thinking from its rich history. The chapters provide a broad coverage of themes that include the history of regional science concepts, methods, models, theories, and fields of application. The opening chapter written by Peter Batey examines the evolving relationship between regional science methods and planning since the mid-1950s. In the early days of regional science, spatial planning was seen as a valid outlet for much of the analytical work carried out by regional scientists, but over the years a gap has emerged. The author argues that regional science may make an important contribution to planning analysis in an open-minded inter-disciplinary dialogue and in a concerted effort to make models and methods more user-friendly. Next, there is a comprehensive account by Michael Lahr and João Pedro Ferreira on shift-share analysis, one of the most durable regional science methods. This approach has become a popular analytical and forecasting tool, including time and space dimensions. The authors point out the need for a theoretical foundation and for methodological linkages with regression and input-output analysis. In a subsequent chapter, written by Gordon Mulligan and John Carruthers, the concept of city size distribution, a focal point of regional science interest, is discussed, addressing the evolution of thought in the area of theory, empirical testing, and policy formation in different time periods. The study of dynamic equilibria in spatial systems of cities in relation to economies of scale and transportation costs appears to provide also interesting liaisons with the new economic geography treated in a later section of this Handbook. Location theory is one of the established cornerstones of regional science to be treated directly and indirectly in other sections of this reference work. In the chapter that follows, Roberta Capello offers a description of classical location theories and re-examines important constituents of locational analysis. She extends the economic logic of location theory towards communication costs and the “death of distance” notion. Folke Snickars discusses then another important element of location theory, namely, market area analysis and the existence of competing firms. After an exposition of earlier theoretical contributions, the author presents a series of behavioral models on location games between sellers and customers, using a cellular automata framework as the basis for simulating equilibrium states of seller-customer systems. In the subsequent chapter, Johannes Broecker focuses then on another major analytical strand in the regional science literature, namely, spatial interaction models. He shows how the derivation of such models has evolved from a variety of

Regional Science in Perspective: Editorial Introduction

XXXI

methodological perspectives (ranging from social physics to microeconomic general spatial equilibrium). Finally, he also argues that the rich and systemic scope of spatial interaction models has also favored its use in policy evaluation. The two final chapters in this History section address two important analytical foci in regional science in relation to planning and decision-making. First, Graham Clarke zooms in on current regional science methods in business and commerce. The author explores three different scales – macro, meso, and micro – and then moves on by focusing on various applications at each of these scale levels (ranging from gravity analysis to microsimulation). The final chapter in this section is provided by Kieran Donaghy, who returns to the links between regional science and planning by offering a survey of contributions made by regional scientists that support spatial planners and urban designers to engage in collective decision-making and design, leading to collective action and effective planning efforts by addressing real-world concerns and practicable approaches.

Location and Interaction Spatial interdependencies between locations have always been at the heart of regional science research. Such interdependencies are clearly related to transport flows and mobility patterns, but go also much further. The section on Location and Interaction provides an overview of spatial interaction concepts, models, and accompanying methodologies. There are eleven chapters in this section. The first chapter, written by Kenneth Button, focuses attention on the ways in which travel behavior and demand are analyzed within the framework of regional science. While the main methods of modeling travel behavior and demand are outlined and critiqued, specific attention is given to the practical aspects of applying travel behavior and demand analysis to subjects such as regional development, infrastructure investment, and congestion analysis. In the next chapter, Harvey Miller discusses activity-based analysis as an approach to understanding transportation, communication, urban and related social and physical systems using individual actions in space and time as the basis, along with corresponding methodologies. In the next chapter, Richard Medina and Nigel Waters offer a concise review of social network analysis and elucidate how social network analytics can be and is used to influence and to understand the spatial manifestations of economic and political activity across regions in continuous and network space. The major social networks now extant are described, and their current activities are discussed, including the topics of micro-targeting, transparency, privacy, proprietary algorithms, and “fake news” and its synonyms such as post-truth. Policy implications to influence and govern these activities, such as taxation and regulation, are reviewed. The relationship between urban development and transport is not simple, but complex and closely linked to urban processes, such as macroeconomic development, inter-regional migration, demography, household formation, and technological innovation. In his chapter, Michael Wegener discusses one segment of this complex relationship by focusing on integrated models of urban land use and

XXXII

Regional Science in Perspective: Editorial Introduction

transport. Such models explicitly model the two-way interaction between land use and transport to forecast the likely impacts of land use and transportation policies for decision support in urban planning. In a subsequent chapter, David Boyce presents one approach to the construction of models of urban travel choices and implicitly location choices. Beginning with the simple route choice problem faced by vehicle operators in a congested urban road network, exogenous constants are relaxed and replaced with additional assumptions and fewer constants, leading toward a more general forecasting method. Anna Nagurney next presents some of the major advances in modeling supply chains and transportation networks. Moreover, the author discusses how the concepts of system-optimization and user-optimization have underpinned transportation network models, and how they have evolved to enable the formulation of supply chain network problems operating under centralized or decentralized, that is, competitive, decision-making behavior. The chapter highlights some of the principal methodologies, including variational inequality theory, that have enabled the development of advanced transportation network equilibrium models as well as supply chain network equilibrium models. Similar in spirit is the chapter on computable models of dynamic spatial oligopoly, written by Terry Friesz and Amir Meimand. The chapter begins with the basic definition of Nash equilibrium and the formulation of static spatial and network oligopoly as variational inequalities and then moves to dynamic network oligopoly models. The authors show that the differential Nash game describing competitive network oligopoly can be formulated as a differential variational inequality involving both control and state variables. In the subsequent chapter, Aura Reggiani reflects on complexity theory and models able to map out the complex interconnected spatial networks, from both a conceptual and applied perspective. Johannes Broecker and Timo Mitze then introduce the theory of labor and capital movements between regions or countries. Movements of other mobile production factors, in particular knowledge, are not dealt with. The chapter explains the basic factor mobility model assuming perfect competition and full factor price flexibility, with particular emphasis given to the welfare results. After studying factor mobility in a static framework, the analysis is extended to a dynamic framework. The chapter also studies factor mobility in the new economic geography and briefly points to the role of the public sector for the analysis of factor mobility. Inter-regional trade – representing the transfer of goods between individuals across regional borders – is a research topic that has been largely neglected by trade analysts. A dearth of data has limited formal exploration of inter-regional trade, but the current magnitudes of data volumes stimulate that greater attention be directed to this form of connectivity between economies. The chapter, written by Geoffrey Hewings and Jan Oosterhaven, starts with a brief review of the theory and tests of international trade and its link to new economic geography and then moves on to examine some alternative approaches to the measurement of trade, especially the role of intra-industry as opposed to inter-industry trade, vertical specialization, trade overlap, and spatial product cycles. The final chapter in this section, written by

Regional Science in Perspective: Editorial Introduction

XXXIII

the same authors, shifts the focus on inter-regional input-output models that have played a major continuous role in the history of regional science.

Regional Housing and Labor Markets Housing and labor market research has been a focal point of attention in the history of regional science, and has generated important contributions to economic theorizing and analysis. The section on Regional Housing and Labor Markets is composed of nine chapters. The first chapter, written by Dionysia Lambiri and Antonios Rovolis, offers a comprehensive review of the fundamental concepts and key analytical tools relevant for real estate and housing market research. While the definition and description of regional housing markets are complicated, the nature and causes of residential mobility are somewhat more clearly understood. In a subsequent chapter, David Brasington examines the state of the literature on residential mobility and house price hedonics. This chapter discusses the theoretical framework, looks at the empirical approaches to test theory, and considers briefly the factors influencing mobility and home prices. Whereas the theory behind second-stage demand regressions stemming from house price dynamics continues to receive some attention, the theory behind residential mobility and first-stage home price hedonics seems to be relatively stable. It is noteworthy that the empirical side of housing markets has received much more exploration in recent decades. Subsequently, Michael Greenwood discusses the role of economic opportunities in the study of migration. From the earliest years of internal migration as a recognized field of study, scholars in many social science disciplines believed that such economic factors are key determinants of migration. During the 1980s and beyond, the availability of microdata that reflects personal employment status and household income has prompted numerous advances in our understanding of various migration phenomena and also has helped to clear up dilemmas regarding earlier studies that used aggregate data. Search behavior is an important issue in the realm of housing and labor market research, including residential mobility and location decisions as well as job-choice problems. In her chapter, Alessandra Faggian outlines the standard model of search in a labor market context along with various extensions. In the subsequent chapter, Jan Rouwendal discusses the relations between housing and labor markets, with a specific focus on the role of commuting. Next, Stephan Goetz reviews labor supply, demand, and equilibrium topics with the goal of showing how they determine labor market outcomes across geographic space. Labor supply curves are based on utilitymaximizing choices between working and leisure, subject to a budget constraint, whereas labor demand curves are derived from the firm’s production function assuming profit-maximizing behavior. The author examines the challenges of defining and empirically delimiting labor market areas from historical perspectives and using statistical cluster analysis with commuting data serving as a key tool.

XXXIV

Regional Science in Perspective: Editorial Introduction

Over long periods of human history, labor market equilibrium involved movements from low-wage areas to high-wage areas, a form of arbitrage under the implicit view that wage differentials corresponded to utility differentials, as Philip Graves argues in his chapter on labor market theories and models. This “labor economics” view may likely be viable as long as movement and information costs are high, and under this view, the movements would be expected to cause wage convergence over space. But given recent failures to observe convergence in wage rates, the author suggests that an alternative view – assuming a utility equilibrium over space – might better predict and explain the labor market equilibrium. In this view the question whether the spatial equilibrium in labor markets involves convergence or divergence becomes an analytically complicated issue. The long-term persistence of pronounced unemployment differentials between regions represents one of the most crucial issues confronting regional theory and policy. The chapter by Francesca Mameli, Vassilis Tselios, and Andrés RodríguezPose reviews the theoretical and empirical literature on regional employment and unemployment research, with a focus on the causes of regional disparities in unemployment rates and on potential explanations. Finally, Laurent Gobillon and Harris Selod focus the attention to the literature on the spatial mismatch hypothesis – originally developed by the economist John Kain with an initial and exclusive focus on African-American poor in inner cities, a hypothesis that relates unemployment and poverty to the structure of cities.

Regional Economic Growth Our understanding of regional economic growth has improved enormously in the past decades. Much of the existing work on regional growth has consisted of efforts to understand how factors beyond capital accumulation and technological change can affect regional development. The very broad conception of regional growth processes is reflected in the section on Regional Economic Growth. The opening chapter by Maria Abreu surveys the neoclassical model of growth and discusses its key features, predictions, and limitations. In the late 1980s and 1990s new models of endogenous growth questioned the neoclassical emphasis on capital accumulation as the main engine of regional growth, emphasizing instead the Schumpeterian idea that growth is primarily driven by innovations that are often themselves the result of profit-motivated research activities. Zoltan Acs and Mark Sanders review then endogenous growth theory and regional extensions, while Steven Bond-Smith and Philip McCann describe how the new economic geography and growth literature incorporates space into the standard Grossman and Helpman product variety model of endogenous growth. Next, the chapter by William Cochrane and Jacques Poot turns to a range of alternative perspectives, broadly referred to as demand-driven theories and models that have their origin in Keynesian economics. Julie LeGallo and Bernard Fingleton proceed to provide a survey and synthesis of econometric tools to study regional economic growth and convergence. An important aspect of this chapter is the diversity of approaches that have been developed to

Regional Science in Perspective: Editorial Introduction

XXXV

link regional growth theory to data. The authors focus on two main strands of growth empirics: the regression approach, where predictions from formal neoclassical and other growth theories have been tested using cross-sectional and panel data, and the distribution approach, which typically examines the entire distribution of output per capita across regions. The chapter illustrates that regional growth empirics has stimulated the development of new spatial econometric tools to address the specific data implications of various growth theories. The section continues to examine a range of growth mechanisms. Some of these have to do with the role of human capital in economic growth. The work on human capital has greatly expanded our understanding of the nature, role, and mechanics of regional economic growth and development. The chapter by Carlotta Mellander and Richard Florida surveys this line of research that can be divided into two main thrusts: studies that focus on and measure human capital in terms of educational attainment and those that measure skills via occupations. Other mechanisms fall outside the domain of the neoclassical model and are related to issues of political and economic institutions, and social capital, for example. Hans Westlund and Johan Larsson then discuss the role of local social capital in regional economic growth, while Annekatrin Niebuhr and Cornelius Peters consider the economic effects of population diversity, with special emphasis on the impact of diversity on regional productivity and growth. Infrastructure is an integral factor supporting regional growth. The chapter by Arthur Grimes outlines two models for analyzing the relationship between infrastructure and regional growth. The first model adopts a standard spatial equilibrium approach and shows that the effect of new infrastructure on regional activity depends on its direct impacts on local productivity, local amenities, and the price of non-traded goods. The second model treats a major infrastructure investment as a real option that gives private sector developers the option, but not the obligation, for further development. The attention being given to well-being as a possible complement to the conventional measures of economic growth constitutes one of the notable turning points in our measurement of progress. The chapter by Philip Morrison focuses on the relationship between regional economic growth and well-being and discusses the conceptual and empirical challenges which subjective measures of well-being pose for regional science. In spite of the ongoing efforts that several countries made into promoting a more balanced economic development within their territory, economic growth theory and even empirical evidence do not come to a unanimous conclusion on the efficiency of policy intervention. The final section of the section, written by Sandy Dall’erba and Irving Llamosas-Rosas, discusses the various strands of the theoretical literature, analyzes the results of empirical estimations in Europe and the USA, and provides recommendations for future research in this field.

Innovation and Regional Economic Development Innovation is a dynamic process bringing about creative destruction and shifting the locus of innovation across industries and among locations. Thus, innovation is

XXXVI

Regional Science in Perspective: Editorial Introduction

fundamental to economic growth and variations in economic development across space. The section on Innovation and Regional Economic Development contains nine chapters. The first chapter, written by Edward Malecki, addresses the geography of innovation and reviews what we know from the work of economists as well as geographers. It begins with the simple world as grasped by patent data and models based on those data. Second, it examines what we know about innovation as a dynamic and complex – even messy – process. Third, this dynamic, complex messiness is seen in the location of innovation and of innovative capability: the geography of innovation at the global scale, which is itself the outcome of several distinct flows and forces. Finally, the chapter discusses briefly the degree to which policy can influence the geography of innovation. In the subsequent chapter, Philip Cooke surveys the theoretical and empirical research into regional innovation systems, a concept increasingly being applied in the world of policy analysis and practice. The chapter begins by addressing the debate on regional governance, learning, and policy contexts and proceeds to discuss the three-phase evolution of the regional systems of innovation perspective. Each of the phase changes is shown to denote important new ways of thinking about regional innovation and evolution. The most recent phase-change represents the engagement of regional innovation systems, as a core subfield of evolutionary economic geography, with key concepts in the complexity sciences. The idea of innovation within regional innovation systems recognizes the importance of feedbacks and interactions that are crucial to innovation. In the next chapter, Emmanouil Tranos, highlights the importance of networks in the innovation process, while Thomas Scherngell reviews the literature on the geography of R&D collaboration networks that has developed largely in Europe, accompanied by significant efforts to support such networks by public funding. In a subsequent chapter, Börje Johansson presents a theoretical framework in which lasting differences in firm performance are related to persisting differences in firms’ innovation and adoption behavior. In recent years, the research focus has shifted from innovation and technology to the broader issues of knowledge and innovative capability. In line with this shift, Frank van Oort and Jan Lambooy focus attention on current theories and empirical research on cities and the knowledge economy, while Charlie Karlsson and Urban Gråsjö study knowledge spillovers and knowledge externalities as drivers of regional economic development. Hereby knowledge spillovers represent the transfer of knowledge from one agent to another one working on similar topics, which results in economic gains. Maryann Feldman and Scott Langford move on to review how network theory and social network analysis have been applied to address questions developed within the field of knowledge spillovers. The authors illustrate the potential of this research to illuminate the channels and mechanisms, especially those related to geography and the geographic concentration of innovation. The final chapter, written by Michaela Trippl and Edward Bergman, reviews the literature on industrial districts, innovative milieus, and industrial clusters that have enriched our knowledge about endogenous factors and processes driving regional

Regional Science in Perspective: Editorial Introduction

XXXVII

development and the role of the region as an important level of economic coordination. This class of stylized development concepts that emerged in the 1970s endeavors to account for successful regional adaptations to changes in the global economic environment. Each of these concepts grew out of specific inquiries into the causes of economic success in the midst of general decline by building upon the early ideas of Alfred Marshall in several ways. Neo-Marshallian districts found in Italy highlight the importance of small firms supported by strong family and local ties, while the innovative milieu concept places great emphasis on the network structure of institutions to diffuse externally sourced innovations to the local economy. Clusters have become far more general in scope, fruitful in theoretical insights, and robust in applications, informing the work of both academics and policy-makers around the world.

Regional Policy Unbalanced, unequal, and problematic development possibilities among different regions have – over several decades – prompted the need for an active involvement of governments in regional development. The existence of regional income disparities, depopulation, and abandonment of rural areas, decisions on new infrastructures, or closure of old industries have driven public bodies to put in place dedicated policies that are generally included under the generic umbrella of “regional policies.” One of the salient features of such policies was that the implementation of regional development programs, including measures to reduce unemployment or to favor balanced growth, was often undertaken by governments without seeking for theoretical foundations and with a very limited conceptual or methodological basis. The awareness of this weak basis for regional interventions has fortunately prompted the design of solid policy strategies, thanks to the progress in regional analysis and novel research on the effectiveness of policies, empirical models, and instruments applied, such as growth poles, economic zones, technology parks, or tax incentives to attract investment. Economists, geographers, and spatial planners have increasingly been involved in a critical examination of many such approaches and instruments and the results achieved, the failures, and the effectiveness of actions undertaken in relation to the objectives pursued, etc. This has clearly stimulated a significant development in regional policy studies, particularly in regional science. The topic of regional policy presents important aspects worthy of study. Some of these have been treated directly or indirectly in other sections of this Handbook. The objective of the present section on Regional Policy is to provide new reflections on key questions in regional development, such as the justification for and focus of regional policy, while also including an analysis of experiences of policies applied in specific cases. Below is a short summary of the content of the chapters included in this section that consists of seven chapters. The chapter by Carlos Azzoni and Eduardo Haddad offers a systematic set of ideas on the need of regional policies, their justification, and their design as well as possible methods for assessment. They highlight that any regional policy has

XXXVIII

Regional Science in Perspective: Editorial Introduction

sectoral and national consequences, while they also address the question whether public authorities should engage in explicit (place-based) regional policies or, rather, in (people-based) sectoral or social policies. Next, in their chapter, Gilles Duranton and Anthony Venables revisit the genesis of regional policies. They highlight that many development policies, such as infrastructure and local economic development schemes, are place-based and establish a rationale for effective policies and a framework for analyzing such effects by assessing their social value. Clearly, a solid analysis of regional policies cannot be conducted without considering a differentiation into relevant spatial levels of intervention. In this context, José-Enrique Garcilazo and Joaquim Oliveira Martins point out that the idea of policies aimed at territories characterized by political-administrative borders should address a division into three levels: large metropolitan areas, rural/intermediate regions close to cities, and remote rural regions. They point at the dominance of large metropolitan areas, particularly supported by the emerging strong tertiarization, and they argue that a balance in spatial-economic space has to be found, linked also to the place-based principle. Subsequent chapters analyze the features of regional policies related to specific cases. Juan Cuadrado-Roura tracks the development of European Regional Policy (ERP), currently integrated into the broader concept of Cohesion Policy. The main contribution of his chapter is an assessment of the ERP’s positive outcomes, as well as its defects and failures, which provide valuable policy lessons. Clearly, within Europe, Eastern and Central countries present markedly different characteristics, partly due to historical circumstances and political developments in the post-WWII period. Grzegorz Gorzelak analyzes the territorial policies applied in these countries and focuses on the contradictions between sectoral and regional policies. He assesses the changes from their integration in the EU and the benefits of the ERP. One of the instruments used in several countries in Asia, Europe and LatinAmerica has been the creation of “Special Economic Zones” (SEZs). In his chapter, Cristopher Hartwell reviews the key features of these zones, which have notable mutual differences, despite their common aims to create artificial, well-delimited areas with advantages and benefits that make them more attractive to investment and to generate employment. The author ponders some key questions, including whether SEZs are a substitute for organically generated institutions of growth, including networks/clusters and agglomerations. Finally, rural areas deserve special attention. In a synthetic chapter, André Torre and Frédéric Wallet provide a systematic inventory of the various instruments used for so-called “rural development policies,” underscoring the diversity of instruments and the multiplicity of policy objectives. They observe clearly a move from a top-down approach towards a decentralized design process in regional policy for rural areas.

New Economic Geography, Evolutionary Economic Geography, and Regional Dynamics Over the past years several new strands of thinking in the field of geography – and the spatial sciences, in general – have exerted a fundamental impact on the theory,

Regional Science in Perspective: Editorial Introduction

XXXIX

methodology, application, and policy analysis in regional science and have also entered “mainstream” regional science. There is clearly not a single or unambiguous paradigm that has become a dominant approach to new modes of regional science thinking. On the contrary, a methodological pluriformity has emerged, as is witnessed by spatial complexity analysis, models of self-organizing systems, spatial resilience analysis, or models of adaptive behavior and control. All such – often dynamic – approaches have gradually acquired a solid position in the modern regional science literature and, therefore, deserve a position in an up-to-date Handbook of Regional Science. The rather long title of the present section, namely, New Economic Geography, Evolutionary Economic Geography, and Regional Dynamics, testifies the broad coverage of new approaches that have entered the regional science domain in recent decades. This section made up by thirteen chapters mirrors the heterogeneity in new regional science developments, in tandem with novel approaches elsewhere. The first chapter, written by Philip McCann, is devoted to various schools of thought on economic geography, in particular the new economic geography, evolutionary geography, and urban science. It reviews some major thematic anchor points for the development of urban and regional research, including the role of institutions for spatial development. Next, Carl Gaigné and Jacques-Francois Thisse revisit the new economic geography from the perspective of a unifying framework, based on spatial locational decisions of firms and households at both macro- and micro-spatial levels, while accounting for urban morphology, quality of life, social and skill composition, and spatial sectoral specialization. Another contribution to a thorough reflection on the new economic geography is provided by Steven Brakman, Harry Garretsen, and Charles van Marrewijk, who interpret the evolution of the new economic geography mainly from developments in international trade theory. They offer a critical account of the empirical testability of new economic geography models, a situation which also hampers their use in regional and urban policy. Another new perspective, in relation to relational and evolutionary economic geography, is provided by Harald Bathelt and Pengfei Li. The authors investigate how these two new approaches can complement each other so as to obtain meaningful applications in economic geography. Particular attention is paid to cluster dynamics and relational-evolutionary approaches, leading to a tripolar framework of context, network, and action. In a similar vein, Ron Martin addresses path dependence in a spatial economic and evolutionary perspective. The author proposes a novel developmental-evolutionary model of path dependence including lock-in cases in a dynamic space-economy. New geographic perspectives regard agglomeration advantages as a fundamental principle. Gilles Duranton offers in this context a critical review of agglomerations and job dynamics, against the background of the functionality of cities and the dynamic interactions between cities, jobs, and firms. Particular attention is given to the position of cities in developing countries. The subsequent chapter, by Riccardo Crescenzi, investigates the connection between economic geography theory and technological dynamics. His chapter explains the relationship between specialization

XL

Regional Science in Perspective: Editorial Introduction

and diversification patterns, institutional-relational factors, and proximity on the one hand and territorial innovation profiles on the other hand. New and dynamic geographical perspectives prompt also new views on barriers to policy. In this context, Henry Overman addresses in particular data availability, limitations of spatial analysis, and spatial policy evaluation. Various new viewpoints on location analytics in the new geography are next highlighted by Alan Murray. In addition to issues like congestion, dispersion, concentration, market size, and production scale, geographic distance frictions remain a fundamental issue in the new economic geography. Spatial optimization, facility location and allocation, market areas, and spatial routing continue to be crucial components in spatial analysis, despite emerging uncertainty in space dynamics and the rise of big data issues. The next portion of this section addresses in particular spatial dynamics and flow analysis. First, Martin Andersson and Johan Larsson focus the attention on spatial industrial dynamics in relation to entrepreneurship. Next to new firm formation, the authors argue that “high impact” entrepreneurship deserves more interest, especially from the perspective of spatial spillovers. It is also stressed that entrepreneurship is key in understanding spatial path dependence and lock-in situations. In a subsequent chapter, Bruce Newbold studies another spatial-dynamic phenomenon in the new geography, namely, migration. He shows that the study of inter-regional migration from the exclusive economic perspective of individual responses to utility-maximization by potential migrants hampers the validity of the new economic geography for migration analysis. Clearly, another field of study in the new geography concerns environmental and climate conditions in a local-global society. In their chapter, Sandy Dall’erba and Zhenhua Chen address inter alia prominent environmental and climate impacts on agriculture and transportation. They propose to adopt a systemic, multi-hazard approach in climate impact assessment, which may also open the way to spatial dependency analysis and adaptation and mitigation strategies. Finally, the relationship between social capital and creativity is addressed by Charlie Karlsson and Hans Westlund, using the new economic geography as a foundation. After a critical review the authors propose a synthesis approach, in which an interactive knowledge producing sector is modeled in the framework of a multi-region system. In this way, a new perspective regarding social capital is offered on cumulative causation, regional disparities, and industrial specialization in a spatial knowledge and network economy.

Environmental and Natural Resources Environmental and natural resources have been seen as major drivers of regional development since the early history of regional science. Its importance has not changed over the past decades, partly due to the awareness of the necessity of these resources for human survival and partly due to new emerging issues such as ecological sustainability, spatial resilience, or climate change. Space is a key dimension of the physical, ecological, and human processes that affect environmental quality and the health of natural resource stocks. Therefore, environmental and

Regional Science in Perspective: Editorial Introduction

XLI

natural resource economics has long wrestled with spatial elements of human behavior, biophysical systems, and policy design. The treatment of space by environmental economists has evolved in important ways over time, moving from simple distance measures to more complex models of spatial processes. The opening chapter of the section on Environmental and Natural Resources, written by Amy Ando and Kathy Baylis, presents knowledge developed in several areas of research in spatial environmental and natural resource economics. First, the chapter discusses the role played by spatial heterogeneity in designing optimal land conservation policies and efficient incentive policies to control pollution. Second, it describes the role space plays in nonmarket valuation techniques, especially the hedonic and travel cost approaches that inherently use space as a means to identify values of nonmarket goods. Third, it explains a set of experimental empirical methods using spatial shocks to estimate the effects of pollution or environmental policy on a wide range of outcomes, such as human health, employment, firm location decisions, and deforestation. Finally, the chapter directs focus on spatial models of human behavior including locational sorting and the interaction of multiple agents in a land use/conservation setting. The importance of uncertainty considerations in the design of environmental policies has long been recognized, and the literature dealing with this topic is vast. Uncertainty is present in nearly all aspects of natural resource and environmental management. The chapter, written by Yacov Tsur and Amon Zemel, reviews the various sources of uncertainty, the methodologies developed to account for them, and the implications for policy-makers. Natural resources and environmental issues frequently involve use of more than one agent. When this is the case, strategic considerations either over time, space, or under uncertainty arise and these considerations can be usefully studied with game-theoretic models, as illustrated in the chapter by Hassan Benchekroun and Ngo van Long. This chapter covers applications of game theory in environmental and resource economics with a particular emphasis on non-cooperative transboundary pollution and resource games. In the subsequent chapter, John Loomis, Christopher Huber, and Leslie Richardson shift attention to valuing environmental and natural resources. Where market prices do not exist, economists can infer willingness-to-pay using revealed preferences methods or using a “simulated market” asking respondents to state their willingness to pay. These revealed preference and stated preference methods are based on the same utility-maximizing process economists use to estimate the demand for market goods. The next chapter, written by Philip Graves, describes the hedonic method as a means of valuing environmental quality improvements, with a focus on air pollution. The author stresses that this method requires very good, ideally perfect perceptions of environmental benefits along with good/perfect knowledge of how environmental quality varies over space. This assumption, however, appears to be highly suspect in many settings. Next, Gara Villalba Méndez and Laura Talens Peiró review materials balance models that rely on physical laws to provide information useful not only to value natural systems but also to determine the amount and the composition of waste produced by human induced processes.

XLII

Regional Science in Perspective: Editorial Introduction

The problem of global warming or climate change is the most serious environmental problem confronting humans and societies today. This problem has international, national, and regional dimensions, and the presence of fundamental uncertainty about future benefits and costs makes it difficult to design and implement policy today. The chapter, written by Daria Karetnikov and Matthias Ruth discusses regional impacts expected from climate change and reviews differences in risk and mitigation capacities across major regions. Then, Emily Talen presents a discussion of what is meant by sustainability first from the regional and then from the city level. Both scales have a long history in the planning domain, but the notion of a sustainable city is key to both realms and the main focus of her chapter. While there is widespread agreement on broad parameters and principles about urban and regional sustainability, there are entrenched debates over implementation. The impact of human population growth on the environment represents another major challenge in our time. The final chapter in this section, written by Jill Findeis and Shadayen Pervez, reviews the relevant literature on the interrelationships among population growth, economic development, and environmental impacts at the micro-, meso-, and macro-scales.

Spatial Analysis and Geocomputation The history of spatial analysis is noteworthy for its genesis in a number of different fields emerging nearly simultaneously. Much of the development has been based on the types of data characteristics of the particular research being done in the respective fields. Geologists and climatologists, for example, tend to analyze continuous data. Economists and political scientists pay particular attention to time series data. Regional scientists, geographers, and sociologists are especially interested in point and area data. Transportation planners favor network data, while many environmentalists use remotely sensed spatial data. Geocomputation is often used as an umbrella term for approaches to the analysis of problems that specifically have a geographic focus, but – in contrast to conventional spatial analysis studies – typically rely on massive and high-dimensional data sets that require modern computational techniques as well as technical infrastructures with performance that scales well with massive data. The section on Spatial Analysis and Geocomputation contains eleven chapters. The first chapter, written by Michael Goodchild and Paul Longley, reflects on the key characteristics of geographic information, the problems posed by large data volumes, the relevance of geographic scale, the remit of geographic simulation, and the key achievements of geographic information science. Next, Michael de Smith reviews techniques that are applied in geocomputation and discusses some of the key underlying principles and issues. Chris Brunsdon shifts attention to Bayesian spatial analysis, by outlining the key ideas, emphasizing the role of Markov Chain Monte Carlo approaches, and discussing techniques for three key types of spatial data: point data, point-based measurement data, and area data.

Regional Science in Perspective: Editorial Introduction

XLIII

Geovisualization and interaction technologies may give users a gateway into their data. In the next chapter Ross Maciejewski presents geovisualization as means for exploring spatial data with the aim to create new knowledge and provide new scientific insights. Geographical visualization is not purely about generating maps of data. It is also about generating tools and techniques for visually representing spatial and spatiotemporal data, and facilitating the exploration of this data for hypothesis generation and exploration. Closely related to this, Linda See addresses the topic of web-based tools for Exploratory Spatial Data Analysis (ESDA), a collection of tools and approaches for the statistical analysis of spatial and spatiotemporal data. The starting point of her chapter is a brief introduction to ESDA, followed by a historical overview of software tools that emerged from the mid-1990s onwards as the Internet was still evolving. The early web-based developments in ESDA are presented followed by the events that led to web-based mapping and web GIS. The author continues to describe the web-based tools that are currently available and reflects on future developments in data rich environments. The field of spatiotemporal data mining emerged out of a need to create effective and efficient techniques in order to turn enormous amounts of data into meaningful information and knowledge. Tao Cheng and associates offer a state-of-the-art review of spatiotemporal data mining research with emphasis on three key areas: prediction, clustering/classification, and visualization. The Modifiable Areal Unit Problem (MAUP) is a serious analytical issue for analysts using spatial data, discussed by David Manley in the next chapter. The chapter provides an overview of the MAUP along with the two related aspects of the problem, the scale effect and the zonation effect, and details, moreover, the role of spatial autocorrelation in understanding the processes in the data that lead to statistical non-stationarity. The next chapter, written by Clio Andris and David O’Sullivan, shifts attention to spatial network analysis. Spatial networks organize and structure human social, economic, and cultural systems. The analysis of network structure is rooted in mathematical graph theory, and spatial networks are a special type of graphs that are embedded on the Earth’s surface. Thus, their analysis necessitates the fusion of graph theoretical and geographical concepts. Key concepts and definitions from graph theory are reviewed and used to develop a variety of graph structural measures. Particular emphasis is placed on three major concepts: high-level network structural features of centrality, cohesive subgraphs, and structural equivalence. With these metrics in mind, the authors describe considerations for their use within a spatial context. The chapter by Keith Clarke then moves on to cellular automata and agentbased models, two modeling approaches for spatial modeling that have enjoyed increasing popularity in recent years. For each type of model, the origins are explored, as are the key contributions and applications of the models and the software used. The next chapter, written by Alison Heppenstall and Dianna Smith, provides a brief overview of spatial microsimulation, with a focus on the main algorithms that are typically employed. The final chapter in the section is on fuzzy modeling in spatial analysis by Yee Leung. Fuzzy set theory based on the notion of fuzzy sets provides a strict mathematical framework for fuzzy modeling in which imprecision

XLIV

Regional Science in Perspective: Editorial Introduction

in the sense of vagueness can be precisely and rigorously analyzed. The author first outlines the basic definitions and operations of the theory and then moves to applications, with special reference to spatial classification and grouping as well as spatial optimization in a fuzzy environment.

Spatial Statistics The roots of spatial statistics date back to Pearson and Fisher, but their modern manifestation is mainly due to Whittle, Moran, and Geary. From Cliff and Ord’s books and papers of the late 1960s to the early 1980s comes the basic outline of what constitutes the key features of spatial statistics. The section on Spatial Statistics contains eleven chapters that mirror the diversity of statistical approaches, methods, and techniques used in regional science. The first chapter, written by Peter Atkinson and Christopher Lloyd, provides an introduction to a set of geostatistical principles and techniques which can be applied to characterize or model spatial variation and use such sample models to optimize the mapping, simulation, and sampling of spatial properties. This chapter is focusing on the kinds of applications which may be of interest to regional scientists. When surveying a phenomenon characterized by spatial variation, it is necessary to find optimal sample locations in the study area. This problem is referred to spatial sampling and treated in the next chapter by Eric Delmelle. The chapter reviews fundamentals of spatial sampling and design. Their characteristics and merits under different situations are discussed, while a numerical example illustrates a modeling strategy to use covariate information in guiding the location of new samples. The chapter, written by Jürgen Symanzik, on exploratory spatial data analysis, an extension of exploratory data analysis geared to dealing with the spatial aspects of data, provides an overview of methods, techniques, and software solutions. The next chapter, by Daniel Griffith and Yongwan Chun, directs attention to (eigenvector) spatial filtering, a spatial statistical methodology that enables spatial autocorrelation effects to be accounted for while preserving conventional statistical model specifications. David Wheeler addresses Geographically Weighted Regression (GWR) that allows relationships in a regression model to vary over space. In contrast to traditional linear regression models, which have constant regression coefficients over space, regression coefficients are estimated locally at spatially referenced data points. The motivation for the introduction of GWR is the idea that a set of constant regression coefficients cannot adequately capture spatially varying relationships between covariates and an outcome variable. The author emphasizes that geographically weighted regression is a relatively simple and effective tool for spatial interpolation of an outcome variable, but a more problematic tool for inferring spatial processes in regression coefficients. The more complex approach of Bayesian spatially varying coefficient models has been demonstrated to better capture spatial non-stationarity in regression coefficients and is recommended as an alternative for inferential analysis.

Regional Science in Perspective: Editorial Introduction

XLV

The next chapter, written by Peter Congdon, offers a comprehensive account of Bayesian modeling of spatial data. Bayesian inference and applications have been a central element in recent developments in spatial statistics. This influence has rested on advances in computer-based estimations via Markov Chain Monte Carlo (MCMC) including recent developments to random walk MCMC approaches. Bayesian ideas have been particularly influential in spatial regression modeling, disease mapping, and analysis of point referenced spatial data. Starting with models for univariate spatial data, models can be extended to multivariate outcomes or to a space-time framework. Spatial models vary in how spatial dependence or correlation is represented, including neighborhood dependence in Markov field models, explicit spatial decay in point referenced spatial process models, and spatial lags or residual correlation effects in spatial autoregressive models. Bayesian implementation has been facilitated by a much improved computational environment centered on the R package. In a subsequent chapter, Virgilio Gómez-Rubio, Roger Bivand, and Håvard Rue present integrated nested Laplace approximations as an alternative approach for fitting Bayesian hierarchical models in which the latent effects are a Gaussian Markov random field. Next, Robert Haining and Guangquan Li compare Bayesian and frequentist or likelihood-based approaches to inference and argue that Bayesian inference is more appropriate to an observational science, while Alan Gelfand presents multivariate spatial process models that focus on spatially referenced multivariate data in the form of vectors observed at a finite set of spatial locations. Such models need to capture both dependence among the components of the vectors as well as spatial dependence across the locations of the vectors. The chapter develops classes of models that provide the desired dependence. An example, using soil nutrient data from Costa Rica, is presented for illustration purposes. The chapter by Sergio Rey shifts attention to space-time data analysis. Particular emphasis is devoted to exploratory methods for space-time data focusing on the evolution of spatial patterns as well as on the identification of temporary dynamics that cluster in space. The final chapter in the section, written by Lance Waller, reviews methods to detect, evaluate, and interpret spatial space-time patterns within disease surveillance data.

Spatial Econometrics Since Jean Paelinck and Leo Klaassen’s description of the field in 1979 and Luc Anselin’s influential book Spatial Econometrics, published in 1988, the field of spatial econometrics has blossomed to address the fact that regional data often exhibits dependence between observed outcomes of nearby regions as well as characteristics of regions used as explanatory variables in regional science relationships. Conventional regression models and associated estimation methods assume independence between observations, so spatial econometrics focuses on modifications needed for estimation and interpretation of regression models involving dependence in outcomes located at points in space.

XLVI

Regional Science in Perspective: Editorial Introduction

The section on Spatial Econometrics is made up by eleven chapters. Separate chapters are devoted to alternative estimation approaches including a chapter by Kelley Pace on maximum likelihood estimation, another one by Jeffrey Mills and Olivier Parent on Bayesian MCMC estimation, and one by Ingmar Prucha on instrumental variables/method of moments estimation. The section contains chapters by Julie LeGallo on cross-section regression models and by Paul Elhorst on spatial panel models and common factors. In his chapter, Elhorst provides a survey of the literature on spatial panel data models and a motivation for use of common factors. Both static and dynamic as well as dynamic models with common factors are discussed. Next, a chapter, written by Xiaokun Wang, is focused on modeling situations involving binary or censored data outcomes, while another one, written by Christine Thomas-Agnan and James LeSage, considers modeling origin-destination flows. The non-linear nature of spatial regression models requires special attention to interpretation of estimates from these models, a topic covered in the chapter on interpreting spatial econometric models by James LeSage and Kelley Pace. Next, a chapter on heterogenous coefficient spatial regression panel models, by Yao-Yu Chih and James LeSage, discusses use of the time dimension of the data in the presence of sample data covering long time spans to estimate coefficients for each region in a space-time panel setting. Since regional science theories often focus on interactions between regions or consumers/firms located in space, region-specific or consumer/ firm-specific estimates open up new ways to more rigorously test our theoretical models as well as challenges for interpretation. Another chapter focusing on endogeneity in spatial models, by Julie LeGallo and Bernard Fingleton, discusses issues that arise when explanatory variables exhibit simultaneity in models for spatially dependent data generating processes. There is a great deal of confusion regarding simultaneous interaction between regional/spatial outcomes in the dependent variable and endogenously determined explanatory variables, and this chapter clarifies issues and provides various suggestions and approaches needed to address this type of applied modeling situation. A final chapter, entitled “Using Convex Combinations of Spatial Weights in Spatial Autoregressive Models,” by Nicolas Debarsy and James LeSage, shows how spatial interaction can be broadened to include other types of interaction between regional outcomes captured in the dependent variable vector. There has been a great deal of dissatisfaction with exclusive use of spatial weight matrices as the basis for modeling interaction between regions, which ignores other types of possible interaction between regions, firms, or consumers located in space. For example, regions could interact based on economic or cultural connections, historical patterns of migration or trade flows, formal and informal supply chain or financial networks, etc. The authors show how to estimate models that combine multiple connectivity matrices by assigning scalar weights to form a convex combination of each type of connectivity. Estimates for the scalar weights that sum up to unity allow inferences regarding the relative importance of the various types of connectivity between regions.

Regional Science in Perspective: Editorial Introduction

XLVII

A Content Cloud Analysis of the Handbook A content cloud (also known as word cloud, text cloud, or tag cloud) is the result generated by an image-based tool that aims to present the most prominent words or terms in a text. The tool uses a web-based algorithm to extract the most essential or pertinent expressions in textual data and summarizes the essence of a text in visualized format. Usually, this presentation uses different font sizes and different colors so as to emphasize the differences between prominent text expressions, in terms of frequency and importance in the text. The content cloud generator usually decomposes a written text into component identifiers and then counts their frequency of appearance in the text concerned. This leads then to a hierarchical visual presentation of the degree of prominence of words or terms in textual data. Content cloud analysis has gained in the past decade quite some popularity as an electronic identification tool of core messages or trend expressions in a text, so that its essence can easily be grasped by the reader. This multi-volume reference work covers a wide variety of concepts, methods, and research tools. Despite their diversity and variability in use, there is a wealth of concepts and terms that show up frequently in the various distinct chapters in this Handbook. The terms with the highest frequency of appearance through the entire reference work are summarized in the form of a content cloud generated by using a

Fig. 1 Content cloud of relevant terms in the Handbook of Regional Science

XLVIII

Regional Science in Perspective: Editorial Introduction

digital search algorithm. The multicolor visualization – indicating commonalities and frequencies of terms through color intensities and font sizes – offers a balanced representation of the main substance issues covered by this reference work. Figure 1 presents the content cloud results from an unsupervised visualization exercise. This means that a priori no words or terms are imposed on the algorithm; it selects by way of auto-organization the most important and frequently occurring core words, as they appear in each of the 110 chapters, of course, after elimination of less relevant terms in this context, such as region, science, or space. It is easily seen that the figure provides an interesting hierarchy of concepts or words such as data, knowledge, economic, growth, development, social, firms, labor, market, capital, network, local, time, location, and so forth. It is noteworthy that many of these higher-order terms are also often appearing in modern regional science research and witness concepts like knowledge economy, data analytics and spatial econometrics, economic growth, regional development, social capital, innovation networks, knowledge spillovers, labor market, and the like. The findings from our content cloud analysis appear to match surprisingly well with contemporaneous regional science research issues, seen from a bird’s eye view.

Closing Remarks Regional science is a fascinating scientific domain characterized by a great vitality and innovativeness. Its multidisciplinary orientation and the active involvement of thousands of scholars with different backgrounds lead to a permanent evolution and creative adaptation to new spatial issues. Regional science is “work in progress,” as is testified by this second edition of the Handbook of Regional Science which shows significant differences compared to its first edition published in 2014. Let us conclude by hoping that the readers find the new, rewritten, and updated chapters informative and instructive. If readers believe that there are important omissions in the reference work or that the reference work could have benefited from a different structure, we do hope that they will share their thoughts on these and related matters with us. In particular we welcome suggestions and proposals for particular research topics that arguably have been neglected. We would like to thank the members of the Springer Reference Editorial Team, in particular Barbara Fess, Esther Niederhammer, and Barbara Wolf, for their support initiating the project as well as in providing invaluable guidance throughout the process. And of course, we are deeply grateful to the Section Editors and the authors for their work. Without their valuable assistance no second edition would exist. Manfred M. Fischer Peter Nijkamp

Section I History of Regional Science

Regional Science Methods in Planning Peter Batey

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 The Origins of Regional Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Bringing Planners on Board in the Early Days . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Before Regional Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Helping Planners to Benefit from Regional Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 A Textbook for Regional Science Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Regional Science and Penn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Isard’s Planning Ally: Robert Mitchell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Testing and Demonstrating Regional Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 The Philadelphia Regional Input-Output Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 The Penn-Jersey Transportation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Czamanski’s Regional Studies in Nova Scotia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Regional Science Methods and British Plan-Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 The Slow Take-Up of Social Science Among Planners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Isard’s Efforts to Establish a British Section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Brian McLoughlin’s Systems Approach to Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Alan Wilson’s Role in Advancing Planning Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Pragmatic Practitioners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Methods of Regional Analysis Revised: A Missed Opportunity? . . . . . . . . . . . . . . . . . . . . . 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 4 6 6 7 8 9 9 11 11 12 13 14 14 15 16 17 18 20 21 23 23

Abstract

This chapter focuses on the evolving relationship between regional science and planning since the mid-1950s when the Regional Science Association was founded. Attempts prior to this to make planning more “scientific” are briefly P. Batey (*) Department of Geography and Planning, University of Liverpool, Liverpool, UK e-mail: [email protected]; [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_133

3

4

P. Batey

reviewed before the efforts of Walter Isard and others to involve planners in regional science are examined in detail. After some initial setbacks, a major landmark was the publication of the first textbook, Methods of Regional Analysis. Isard’s key planning ally was Robert Mitchell, head of the planning school at Penn. As a highly regarded academic and practitioner, Mitchell is seen to have had a crucial role in endorsing regional science as a valuable aid to planning. The chapter reviews ambitious demonstration projects from the 1960s that illustrate some of the most advanced regional models of that time and the challenges faced in implementing them. Isard’s drive to establish regional science in Europe hit problems in Britain where planners in particular were not familiar with quantitative methods and abstract thinking. The chapter describes how ultimately British planners came to adopt regional science methods, as part of Brian McLoughlin’s “systems approach to planning.” It underlines the role of mathematician Alan Wilson, a strong advocate of planning analysis. The chapter emphasizes the pragmatic way in which practitioners view the toolbox of planning methods, regarding usefulness as much more important than sophistication. “Aim too high and you will risk losing your audience” would seem to be the recurring message as far as planners and regional science methods are concerned. The chapter ends by considering the prospects for greater use of regional science methods by planners. While opportunities may have been missed in the past, as a result of a number of recent developments, there is some cause to be optimistic about the future. Keywords

Regional science methods · Planning practitioners · Plan-making · Regional science association

1

Introduction

1.1

The Origins of Regional Science

Regional science as a field of study is widely acknowledged to have its origins in the late 1940s and 1950s and owes its existence to the herculean efforts of one man – American economist Walter Isard. Isard was deeply dissatisfied with the failure of his fellow economists to handle space in their deliberations and felt that, to remedy this, the rigorous analysis of cities and regions would benefit greatly from an interdisciplinary approach that drew from other social sciences as well as from economics. He organized a series of informal meetings among regional researchers which began to sketch out the scope and content of a new field to be named “regional science.” Isard emerged as a tireless advocate for regional science, first among economists and later sociologists, geographers, planners, economic historians, and political scientists. He was able to tap into a growing interest in regional problems

Regional Science Methods in Planning

5

bringing with it a demand for innovative theories and techniques that might aid analytical understanding and assist in the making of public policy (Isard 2003). Among academics and practitioners alike, there was keen anticipation of what could be achieved in regional science. The success of these early exploratory meetings encouraged Isard to found the Regional Science Association (RSA) in 1954 (renamed the Regional Science Association International (RSAI) in 1991). Initially regional science was almost entirely a North American pursuit, but in the following decade and a half, Isard was to devote much of his energy to establishing and promoting sections of the Association throughout the world. So, what was it that the Regional Science Association hoped to achieve? Here it is helpful to quote from the constitution: . . ...the main objective shall be to foster exchange of ideas and promote studies focusing on the region and utilizing tools, methods and theoretical frameworks designed for regional analysis, as well as concepts, procedures and analytical techniques of the various social and other sciences. (Isard et al. 1998, 1)

On this last point, Isard was particularly keen to encourage dialogue between academics from different disciplines. Thus, for example, in planning the program for the 1956 Meetings in Cleveland, he invited five speakers to address the conference on the contribution they felt regional science would make to their discipline, choosing to focus upon geography, planning, economics, sociology, and political science. In those days there were no parallel sessions which meant that delegates had the chance to hear the views of those from other subjects than their own. Isard was clear on where the emphasis should lie with regard to interdisciplinarity: The first task was not to immerse ourselves into problems and policy; rather, we had to bring together what knowledge had been accumulated in the several social science and professional fields so that each of us, given his specialized background, could increase our common knowledge and know-how concerning relevant disciplinary concepts, theories, and techniques. (Isard 1979, 10)

By the end of the 1950s, it was evident that this approach was producing mixed results. In many social science disciplines, including political science, sociology, economic history, and anthropology, despite Isard’s best efforts, “no significant contact had been achieved” (Isard 2003, 189). The chief exceptions were geography and planning where there was much greater dialogue with the emerging core of regional science and with Isard’s own discipline of economics. This chapter explores the evolving relationship between regional science and planning, since the mid-1950s when the Regional Science Association was founded. It focuses on developments in North America and Great Britain. First, a comment about terminology. The generic terms “planner” and “planning” will be used throughout the chapter rather than attempting to make fine distinctions between various kinds of planner and planning: the usage of labels “city”, “town”,

6

P. Batey

“regional”, “strategic”, “spatial”, and “metropolitan” has changed over time, and the terms sometimes take on a different meaning in different parts of the world.

2

Bringing Planners on Board in the Early Days

2.1

Before Regional Science

Planners have long debated the role of surveys in the making of plans. An important early influence was a series of major social surveys on both sides of the Atlantic (Batey 2018), notably in Pittsburgh and Chicago in the United States and London and York in the United Kingdom, in the early years of the twentieth century. They generated substantial publications documenting in considerable detail the plight of the poor in large cities and making an urgent call for government action. This led to greater awareness among planners of systematic survey methods, the use of statistical methods to analyze survey results, and the role that social mapping could play in presenting those results. In particular, the Pittsburgh survey of 1907, led by Paul Kellogg, encouraged many American planners to undertake systematic data collection exercises. This activity was spurred on by an efficiency craze that swept the country from 1911 onward, linking data gathering with scientific management, promoted by Frederick W Taylor. “Taylorism”, as the movement became known, was embraced enthusiastically by many leading business leaders, city officials, and politicians. Understandably, many planners felt that in order to convince civic leaders of the merits of their city plans, they too needed to be more scientific in their diagnosis of a city’s problems. In their view, planning was about more than civic beautification, or the “City Beautiful”. Some went so far as to present planning as an exact science the “City Scientific”. Although the City Scientific movement failed to attract lasting support, it nevertheless serves as a useful precedent for what followed many years later when Isard was seeking to launch regional science. There are several other landmarks in the “prehistory” of regional science. The Regional Plan of New York and Its Environs (1929) was the first long-range, regionwide plan of its kind. It collected and examined extensive quantitative data about demographics, population distribution, and economic conditions; it advanced the development of population projection methods; it first introduced economic base theory; and it developed new theories of intraurban location. The plan stands as an exemplar of the important contribution applied (and interdisciplinary) social science can make to regional planning. The US National Resources Planning Board (1933–1943), a small independent federal agency reporting directly to the President, did much to advance the analysis of industrial location, culminating in a major report, Industrial Location and National Resources. The methodology set out in this report represented a major advance for contemporary planning practice. Significantly, many of the leading planners and economists subsequently involved in creating the RSA, including Isard himself, were at some time staff members of, or consultants to, the NRPB.

Regional Science Methods in Planning

2.2

7

Helping Planners to Benefit from Regional Science

Regional science developed at an opportune time as far as planners were concerned. By the mid-1950s, there was burgeoning interest in coping with high levels of projected growth in metropolitan regions, and the implications this would have for planning the transportation system. In the mid-1950s, fresh thinking about planmaking was starting to emerge. In two important books, Meyerson and Banfield (1955) introduced the notion of a rational planning process, allied to rational decision theory, while Mitchell and Rapkin (1954) pointed to the need for planners to understand, and to plan for, the interrelationship between land use and traffic. Arguably, more so than at any point in the previous 50 years, planners were receptive to advances in planning technique. It is not surprising, therefore that, from the outset, planners played a full part in the Regional Science Association. Isard, who had earlier worked with Wassily Leontief on the Harvard Economic Research Project, took up a post in the MIT planning school as a regional economist, a position that carried with it the Directorship of the Section of Urban and Regional Studies. This provided him with ample scope to work on a range of applied regional research projects, while at the same time continuing to develop the theoretical side of regional science, culminating in a well-received text, Location and Space Economy (Isard 1956). Isard set high academic standards for regional science. He pitched the first RSA conferences at a demanding technical level but quickly realized that planners in particular were not trained to think of regions in an abstract way and generally had little grounding in mathematics and statistics that would enable them to fully understand and apply the new regional science techniques. He therefore took steps to “bring planners on board,” organizing joint meetings of the American Institute of Planners and the RSA. From 1956 onward, these meetings focused on economic base analysis, gravity models, and interregional linear programming models. Isard himself played a major role in these sessions, as did his research students Ben Stevens, Robert Coughlin, and Gerald Carrothers. Indeed, the lectures drew extensively on their PhD work and were technically very demanding. Clearly some of the material would be familiar to the small minority who were regular readers of their professional journals, but for the majority the content of the lectures and the overall approach were entirely new. Isard expressed himself well pleased with the level of interest generated among planners and others working in the expanding field of metropolitan planning. But among his audience, there were undoubtedly those who needed more convincing about the value of regional science. The fact that the cynics among them gave Isard the nickname “Wizard” suggests that they were left mystified by his presentations, as recollected by John Reps, Cornell University, interviewed on 21 June 2017. By catering deliberately to the high fliers, he ran the serious risk of alienating many potential adopters of these new ideas. Notwithstanding the limited impact of this educational venture, concern continued to mount over the widening gap between academics and practitioners in regional science. This led to the preparation of a statement outlining specific ways in which this gap could be filled: Needed Metropolitan and Urban Research. The paper was

8

P. Batey

discussed in the business meeting at the 1958 RSA Meetings in Chicago. A committee, made up of representatives from most of the disciplines contributing to regional science, was set up to explore what exchange of information, stimulation of research, and types of meetings would best serve the specific metropolitan field and how such gaps could be filled by the RSA.

2.3

A Textbook for Regional Science Methods

With hindsight, it is clear that one of the biggest obstacles to bringing planners on board was the absence of a comprehensive textbook, covering the full range of regional science methods. The solution came in 1960 with the publication of Methods of Regional Analysis. Written by Isard, again in collaboration with his PhD students, the book was an extremely ambitious undertaking, running to 784 pages. First conceived in 1954, the book was intended: . . ..to make available in a relatively simple and clear-cut form the several techniques of regional analysis which have proved to have at least some validity. An attempt is made to set forth the virtues and limitations of each of these techniques so that the research worker and policy maker may be able to judge its applicability for a particular regional situation and problem. (Isard et al. 1960, xi)

Topics covered include population projections, migration estimation, economic base analysis, regional multipliers, industrial location analysis, industrial complex analysis, and input-output analysis. There is a remarkably full bibliography that captures a vast amount of what was then the current literature, providing a valuable resource for those readers wanting to delve more deeply into the subject material. The most challenging part of the book comes in the final four chapters which set out the techniques of interregional linear programming, gravity models, and, most ambitious of all, an attempt to integrate the various techniques under a single framework called Channels of Synthesis. Unlike the earlier sections of the book that deal with the state of the art, these chapters are included to show what the future of regional science might hold. Whereas the bulk of the book may be seen as directed largely toward an audience of practitioners, these later chapters are aimed at (present and future) academics wishing to embrace the new field of regional science. In a retrospective review written in 2009, Andrew Isserman described the book as “the practical side of regional science that caught on worldwide” commenting that “It is more thorough and thoughtful than the slimmer texts written since” and “Any planner who uses these methods will benefit from the rock-solid, wise discussion it provides” (Isserman 2009, 265). Methods of Regional Analysis would no doubt have played an important role in Isard’s efforts to spread the word about regional science beyond the United States. Even today, nearly 60 years after it was published, the book is a key reference, frequently quoted. Anyone reading it now, however, will find that the methods described are largely those originating from economics and demography: the book came too early to

Regional Science Methods in Planning

9

be able to reflect the growing interdisciplinary nature of regional science. This means, for example, that it omits important developments in quantitative human geography, dealt with fully in Peter Haggett’s influential Locational Analysis in Human Geography published just 5 years later (Haggett 1965). And those expecting to learn about suitable computational methods will also be disappointed: the use of computers was then in its infancy. Perhaps more surprisingly, there is very little coverage of matters to do with data, especially since this was something that loomed large among those participating in the early discussions about regional science (Isard 2003).

3

Regional Science and Penn

3.1

Isard’s Planning Ally: Robert Mitchell

Isard’s move to the University of Pennsylvania (Penn) in 1956 proved to be a milestone in the development of regional science and particularly in its evolving relationship with planning. At MIT, he and his colleague Lloyd Rodwin had been developing plans to establish a PhD program in Planning Analysis. When the proposal was deferred, because of the uncertainly surrounding the impending arrival of a new departmental chair, he began to look elsewhere. He chose to move to Penn, ostensibly because the Economics Department was at a low ebb and was receptive to a change in direction in order to remain competitive. Isard moved quickly to found a Department of Regional Science, in effect replacing Penn’s long-established Department of Geography and Industry within the Wharton School. His new department took over the teaching of geography and in so doing was able to make additional staffing appointments. This helped increase the Department’s capacity to undertake ambitious research projects and to offer an attractive range of graduate-level courses. This, however, was only part of the story: Isard could also see that he would have important allies in the Department of City and Regional Planning. Established some 5 years earlier, the Department had succeeded in attracting an extremely strong faculty. Table 1 lists those planning faculty members based at Penn at around the time that Isard took up his appointment as a Professor of Economics. In the decades that followed, many of these individuals were to make important contributions to regional science. A key figure here was Robert Mitchell who had joined the Department when it was created in 1951 and had served as its first Chairman. Mitchell’s contribution to regional science turned out to be crucial. He had extensive experience as a planning practitioner, working on housing renewal projects in Chicago in the 1930s and with the National Resources Planning Board in the early 1940s as head of its urban division. His main project with the NRPB was to develop a standard method of planmaking capable of being used in cities lacking trained planning staff. Tacoma, Corpus Christi, and Salt Lake City served as examples of how the method could be applied in practice. At Columbia University in the early 1950s, he had collaborated with Chester Rapkin on a pathbreaking study Urban Traffic: A Function of Land-Use (Mitchell and Rapkin 1954) which did much to advance thinking about

10

P. Batey

Table 1 City and Regional Planning at Penn: The “Golden Era” of the late 1950s/early 1960s The Penn planning faculty – an all-star team: Robert B Mitchell, Chair (Transportation Research) William L C Wheaton (Housing) Martin Meyerson (Theory and Practice of Planning) Walter Isard (Economics and Regional Planning) Ian McHarg (Landscape Architecture and Regional Planning Chester Rapkin (Transportation and Land Use Research) David Crane (Physical Planning and Design) Lewis Mumford (City Form and Culture) Research Associates at the Institute of Urban Studies: Herbert J Gans, John Dyckman and Britton Harris Complementing the core faculty were research assistants Tom Reiner, Janet Scheff Reiner and Paul Davidoff Source: Feld (2011)

transportation modeling. In the view of one of his close collaborators, Henry Fagin, Mitchell had “emerged not only as keen analyst of urban structure and change but also as a respected leader of public thought and action” (Fagin 1963, 11). It was perhaps not surprising, therefore, that Mitchell was chosen as one of the early Presidents of the Regional Science Association; indeed he was the first person, other than Isard himself, to lead the organization. Mitchell and Isard had worked closely together over the years, so that, for example, it was Mitchell whom Isard chose to chair one of the early (1952) interdisciplinary meetings on metropolitan regional research (Isard 2003, 53). Mitchell’s Presidential Address, delivered in 1960, looked at how regional science could assist planning and outlined new thinking about the plan-making process, drawing on the recent work of Meyerson and Banfield on rational decision-making in planning. This provided a useful framework for thinking about analytical planning methods and the role they could play. Mitchell had this to say about what regional science means for the planner of cities: It can. . .help provide a discipline for thinking about problems. Even if the method in all its details may be non-operational, the framework may be still valid; and if one has to make assumptions, even if non-quantitative, they can be more easily recognized. (Mitchell 1961a, 7)

This strong endorsement of the value of regional science to planning was well-timed and was able to counter some of the seeds of doubt that had been sown. Mitchell went on to state: “Even in its early stages, regional science can permit us to revolutionize planning method” (Mitchell 1961, 7). Like other Presidential Addresses, Mitchell’s paper was published in the Papers of the Regional Science Association (Mitchell 1961). Significantly, however, a very similar version found its way into the American Institute of Planners Journal, thus ensuring that Mitchell’s message was received by a wider cross section of academics and practitioners.

Regional Science Methods in Planning

4

11

Testing and Demonstrating Regional Models

Soon after arriving at Penn, Isard established a PhD program in regional science and 2 years later, in 1958, was able to found a Department of Regional Science and a new journal, the Journal of Regional Science. In the 1960s, the new Department set about establishing a series of projects intended to demonstrate how regional models could be constructed and applied. Many of the projects were undertaken in conjunction with the newly established Regional Science Research Institute. Three examples have been chosen here to illustrate the type of demonstration projects that were undertaken. Two were based at Penn and focused on the Philadelphia region, while the third was carried out over a number of years in Nova Scotia, Canada, by Stan Czamanski from Cornell University. At a time when the field was rapidly gaining new followers, these were cutting edge studies that gave regional scientists something to aspire to, while at the same time recognizing there were likely to be pitfalls along the way.

4.1

The Philadelphia Regional Input-Output Study

The first of these projects was the Philadelphia Regional Input-Output Study designed to test the feasibility of building a highly detailed (500 sector) regional input-output model. Starting in 1962, the project was initially intended to explore the impact of the National Aeronautics and Space Administration (NASA) upon the Philadelphia economy but later evolved into a series of wider investigations of the impact of federal agency expenditures. Most notable is a study concerned with the impact of Vietnam War expenditures. The Philadelphia Regional Input-Output Study was extremely ambitious, especially given the modest computing facilities available at that time: never before had anyone attempted to build a regional model on such a large-scale, and many practical lessons were learned along the way. These were carefully documented as it was intended that other researchers should have access to the data for their own research. Interestingly, the direct coefficients input-output table itself was made available, in hard copy form and for a minimal charge to other researchers in the field, in what was essentially a very early example of data sharing. The project as a whole must be regarded as a mixed success, a big achievement in terms of data collection and handling but probably too ambitious, and long-drawn- out, to be useful to the bulk of regional scientists at the time. Some years later the experience, good and bad, was recorded in a book, described by Isard in the Foreword as “a book which brings regional science down to earth and firmly roots it in the world of reality” (Isard and Langford 1971, v). The Philadelphia Study was like no other, before or since, in its close attention to the nitty-gritty aspects of data collection and handling, and was, in its own way, a landmark in the development of applied regional science.

12

P. Batey

4.2

The Penn-Jersey Transportation Study

The second example is also based on Philadelphia: the Penn-Jersey Transportation Study, covering the nine-county Camden-Philadelphia-Trenton metropolitan region. Begun in 1959, the Study owes much to Robert Mitchell who wrote the study brief. At about the same time, Mitchell was advising a US Presidential Committee on land use transportation planning (Mitchell 1959) and therefore well-informed about developments elsewhere, such as Chicago, Detroit, and Pittsburgh. Mitchell’s brief for the Penn-Jersey Study placed particular emphasis on the two-way relationship between land use and transportation, so that: . . .the recommended transportation system should provide convenience and economy of travel, but also that its influence on the development of the area should tend toward facilitating a desired pattern of regional development. Stress is to be laid on the design and analysis of alternative patterns both of possible transportation systems and of the future regional development likely to be associated with each system. (Penn-Jersey Transportation Study 1959)

The Penn-Jersey Study was well funded and viewed at the outset by the Federal Government and the Bureau of Public Roads as both a research and a practical planning effort. There was an expectation that, rather than being a one-off exercise, the Penn-Jersey Study would provide the springboard for a permanent regional planning process (Fagin 1963). It soon emerged that the value of the Penn-Jersey Study was as a series of research experiments, each of them ambitious and innovative, and leading in due course to fruitful academic research on urban modeling. There is little or no evidence though that the Study had any direct impact on planning practice. Britton Harris, seconded from the City and Regional Planning Department at Penn, had a big influence on the design and execution of the research program. Early in the Study, a scoping exercise gathered information about concepts and techniques being applied in other metropolitan planning studies. In the Penn-Jersey Study, there would be much more emphasis on developing a suite of models covering land use, transportation, and their interaction. Today, the Penn-Jersey Study is perhaps best remembered for the so-called Herbert-Stevens Model, developed by Penn regional science doctoral student John Herbert and his supervisor Ben Stevens. In a pioneering application of linear programming, the model is designed to distribute households to residential land in an optimal configuration. It was part of a larger model designed to locate all types of land-using activity (Herbert and Stevens 1960). As far as planning is concerned, the Penn-Jersey Study is important because of its wider influence on integrated land use transportation planning, in technical and conceptual terms, rather than its immediate effect upon the planning of the PennJersey Study area. And like the Philadelphia Regional Input-Output Study, because it was ambitious and using largely untried techniques, it was very much a matter of learning lessons as the Study went along. The Study did serve to demonstrate what

Regional Science Methods in Planning

13

could potentially be achieved using a large-scale regional model even though in practice planners had to settle for less.

4.3

Czamanski’s Regional Studies in Nova Scotia

The third and final example is based on Stan Czamanski’s regional studies in Nova Scotia. Czamanski, born in Poland, had a remarkably broad education, including textile technology in Lodz, foreign trade and textile technology in Vienna, economics and social science in Geneva, and philosophy in Jerusalem and culminating in a PhD in regional science in 1963 with a thesis that developed an early regional econometric model. His previous career had included spells as an urban and regional planner, in his native Poland, and as the production manager of a textile company in Israel. After gaining his doctorate at the age of 45, Czamanski was appointed to a post in the Department of Regional Science at Penn. His academic career developed quickly from that point on and he moved shortly afterward to Cornell University, working with colleagues to establish the field of regional science there. At the same time, Czamanski began what proved to be a 10-year research collaboration (1966–1976) with the Institute of Public Affairs at Dalhousie University, Nova Scotia in Canada, spending his summers there working on a series of regional model building studies, as he put it “. . . testing and sharpening the tools of regional analysis” (Czamanski 1972, xvii). Czamanski went on to explain the purpose of his research studies: My work in Nova Scotia . . .provided me with a long-sought opportunity not only of experimenting under realistic conditions with methods of analysis but of expanding an hypothesis of urban growth, focusing on locational attractiveness and investment equilibrium as significant determinants of progress. (Czamanski 1972, xvii)

Nova Scotia was a stagnating region: two of the principal industries – coal mining and iron and steel – were facing serious problems, per capita income was low, and labor force participation rates were low too. Czamanski found that while there were many planning initiatives, there was a distinct lack of coordination at the regional level, with the result that there were no agreed-upon planning objectives. This meant that the set of regional studies had to be designed to provide a framework and be capable of answering a host of pertinent questions. Czamanski had to anticipate the sort of questions that might be relevant to the policy makers whose job it was to deliver a better future for the province. He identified four main topics to be addressed in his studies: • An attempt to assess and measure backwardness and decline: low per capita output; per capita income; it was important to consider the role of noneconomic goods, but this would be difficult to measure.

14

P. Batey

Table 2 Contents page for Czamanski’s Nova Scotia book 1. Introduction: Study Background 2. Locational Attractiveness: The Resource Base 3. Structure of the Regional Economy: Contribution of Income and Product Accounts 4. The Industrial Basis: Contribution of Location Analysis 5. Sectors and Linkages: Contribution of Input-Output Studies 6. Invested Capital and Infrastructure: Contribution of Wealth Accounts 7. A Simple Econometric Formulation 8. Conclusions Source: Czamanski (1972)

• Industrial development: whether to try to attract new industries by investing public funds in infrastructure or by providing direct subsidies; deciding which industries to promote? • Location within Nova Scotia: one growth point or several? What would be the cost of re-location and re-settlement? What would be the value of abandoned assets? • The costs of development to achieve socially desirable results: what factors determine the rate of economic advance? What is the optimal allocation of resources to promote growth? (Czamanski 1972, 6–7). This list of topics proved very helpful in structuring Czamanski’s regional studies. And planners in other stagnating regions were likely to find that the contents of the list also reflected the conditions they were facing themselves and so could identify with the research. Czamanski wrote up the results of his first 5 years’ research in Nova Scotia in a book that was notably successful in demonstrating the relevance of the regional science techniques he used (Czamanski 1972). And more so than either of the Philadelphia studies, there was a refreshing degree of realism in the Nova Scotia work reported in the book, the contents page of which appears as Table 2.

5

Regional Science Methods and British Plan-Making

5.1

The Slow Take-Up of Social Science Among Planners

Compared with their American counterparts, British planners were slow to discover the social sciences and the potential they offered for enhancing and strengthening the plan-making process. Nevertheless, during and immediately after the Second World War, there were signs of change. In a series of experimental planning studies, small teams of planners were able to carry out regional surveys that drew upon geography, sociology, and economics and began to demonstrate the value of quantitative methods in the systematic analysis of data. In two areas in particular, in the English West Midlands and in Middlesbrough in North East England, planners were able to show what could be achieved with limited resources and raised the bar with respect

Regional Science Methods in Planning

15

to plan content, presentation, and methods. A fuller account of this work may be found in Batey (2018). However, the results of the experiments filtered through very slowly to British planning practitioners. Lloyd Rodwin, the American land economist and sometime MIT colleague of Isard, spent a sabbatical in Britain in the early 1950s. In a classic paper (Rodwin 1953), Rodwin identified what he called the “Achilles Heel of British Town Planning.” He was moved to comment as follows: Rigorous socio-economic analysis and systematic research rank among the key weapons of the planners. They have been neglected, partly because many planners do not yet know how to use them. (Rodwin 1953, 34)

He attributed this to a general absence of research in planning, putting the blame partly on the universities where, it seemed to him, there was little or no interest. He also pointed to the almost complete neglect of the social sciences. Part of the answer to this problem lay in the system of planning education which was then firmly based on physical planning. An important development was the publication of the UK Government’s Schuster Report. This supported the opening up of planning education to social scientists. As a result, by the early 1960s, many more social scientists were being employed in the British planning system, and it became more realistic to expect plan-making to reflect this.

5.2

Isard’s Efforts to Establish a British Section

In this respect, Isard’s early efforts to found a British Section of the RSA are instructive. In 1960 he had made a tour of Europe, stopping off for conferences and workshops along the way, intended to raise the profile of regional science. The time was ripe for new developments: the North American RSA was by now wellestablished and on a stable footing; Isard had published his textbook, Methods of Regional Analysis, a showcase for regional science methods; and the RSA constitution now made allowance for the formation of sections. A meeting held at the LSE, in London, in July 1964 provided him with a platform to launch the new section, or so he thought. It attracted a big turnout, much bigger than Isard had anticipated or indeed wished for. Interest in regional planning was growing rapidly as a result of the actions of a recently elected Labor Government in creating a framework for regional planning. Half the audience of more than 100 attending the LSE meeting were planners concerned with solving practical regional problems. Isard’s notes from the time (see Fig. 1) indicate that he attempted to align his presentation to their interest but that much of what he had to say undoubtedly went above their heads: in fact a repeat of what happened in the United States during the early days of the Regional Science Association. In this case, however, the consequences were more far-reaching, with several of the most influential people attending deciding to found their own organization, the Regional Studies Association.

16

P. Batey

Fig. 1 Isard’s notes for the lecture he gave in June 1964 when seeking support for a British Section of the RSA. (Source: Regional Science Association International Archives, Cornell University, consulted on 21 June 2017)

5.3

Brian McLoughlin’s Systems Approach to Planning

By the late-1960s things began to change. British planners in the larger local authorities had begun working on a new type of strategic land use plan, the structure plan, and needed guidance on how these plans might be prepared. A series of articles by Brian McLoughlin, published in the Town Planning Institute’s professional journal, suggested how the planning process might be reinterpreted in the light of new thinking emerging from the United States. A paper delivered by McLoughlin to an audience of planning practitioners at the Town and Country Planning Summer School in Belfast in 1967 brought these ideas together in what McLoughlin called the “Systems Approach.” He had recently been seconded from his post at Manchester University to direct a subregional planning study of Leicester and Leicestershire, a growth area in the English East Midlands, and this gave him and his planning team the opportunity to use some of the new techniques “for real.” Like Robert Mitchell, McLoughlin had a mixed background in academia and planning practice. This increased the likelihood of planners listening to what he had to say. McLoughlin was clearly an admirer of Mitchell’s work, commenting that:

Regional Science Methods in Planning

17

No better description of the plan-making process has been given than that by Mitchell (1959) to a US Presidential Committee. (McLoughlin 1969, 253)

Alongside his work as Director of the Leicester and Leicestershire Study, McLoughlin was inspired to write a textbook about his Systems Approach, published in 1969. As one of a small number of British planners who kept up with the American planning literature, McLoughlin acted as a valuable conduit in communicating these new ideas to an audience of British planning practitioners. He used Mitchell’s ideas about the continuous metropolitan planning process to structure the book, blending discussion of American work on land use transportation planning with ideas drawn from systems theory and cybernetics. His use of the term “Systems Approach” at the time was new to planning but was familiar to academics working in management science, thanks to the work of Stafford Beer whom McLoughlin quotes frequently in his textbook. Moreover, it probably suited McLoughlin in promoting the book to use a phrase in the title that suggested a move away from traditional physical planning. The textbook was a great success, both in Britain and throughout much of the developed world, capturing the imagination of students and practitioners alike. It was not highly technical – in more than 330 pages, there is not a single equation – but nevertheless it did just enough to persuade his readers what might be possible. For many readers it was perhaps enough to learn at the very least about a rational planmaking process and to recognize that interrelationships are important, whether it be in cities, in plan-making, or in making development decisions.

5.4

Alan Wilson’s Role in Advancing Planning Analysis

Having failed in his earlier attempts to found a British Section of the RSA, Isard tried again in the late 1960s, and this time was successful. He was assisted by Allen Scott who at the time was seconded from his post at Penn to the Joint Unit for Planning Research at University College London. Working with David Harvey and Alan Wilson, Scott organized the first conference of the newly formed Section in 1967 and began to build up a network of British regional scientists. Scott shared Isard’s view that regional science must be rigorous and challenging. This was reflected both in those invited to give papers and in the makeup of the audience. Significantly, this was a younger set, many of whom had done graduate work in North America and therefore knew something about regional science. There was hardly any overlap with the larger group that had rejected Isard’s initial plan to form a British Section at the LSE meeting in 1964. The conference programs were dominated by presentations from mathematicians, transportation analysts, statisticians, operational research specialists, as well as quantitative geographers and numerate planners. Interestingly, unlike regional science in North America, other parts of Europe, and Japan, there were comparatively few economists, at least in the early days of the Section. On the other hand, there was a strong representation of planning academics and practitioners. Partly this reflected the interest in analytical techniques associated with the

18

P. Batey

new structure planning. Some, no doubt, would have been influenced by McLoughlin’s textbook and had gone on to teach themselves the mathematical, statistical, and computing skills needed to apply the new methods. Just as important, however, was the academic leadership of Alan Wilson, who served as Chair of the British Section for its first 7 years. A mathematician by training, Wilson’s route into planning and regional science was via transport modeling. For a short time, he led the Mathematical Advisory Unit in the UK Government Ministry of Transport where his biggest project was to develop a suite of models for use in transport planning. The test-bed for this work was the Greater Manchester metropolitan area, then known as SELNEC (South East Lancashire and North East Cheshire). The development of the SELNEC Model paralleled similar developments in the United States and was notably successful in its application to land use transportation planning (Boyce and Williams 2015). Wilson’s main technical contribution at the time was in the development of spatial interaction models derived using an entropy maximizing procedure. Wilson soon went on to lead the Center for Environmental Studies in London and from that base recruited a team of researchers to work on planning techniques. And when he moved to a Chair in Geography at Leeds University in the early 1970s, he maintained this research interest in planning analysis, while at the same time developing a strong research and development program based on applications of regional science methods in business. One of Wilson’s other major contributions was as the founding editor of Environment and Planning, first published in 1969. For many years this journal focused on analytical approaches to planning, setting the same high standards of rigor that Isard had encouraged when developing regional science and the RSA. Throughout the 1970s, McLoughlin’s book continued to influence British planners who developed an appetite for regional science methods to the extent that an active user community emerged. By the early 1980’s, however, some of the novelty of regional science methods had worn off, at least among planners working in local authorities. It prompted Batty in 1982 to remark “It is an open question as to whether planning will swing back to such a technical approach during the next decades” (Batty 1982, 257).

5.5

Pragmatic Practitioners

The answer to Batty’s question is not as straightforward as might first appear. The interest of many planning practitioners in methods is a pragmatic one based on identifying tools that will yield useful results without adding significantly to the time taken to prepare a plan or develop a policy. Unlike academic researchers, practitioners are generally not impressed by the quest for greater technical sophistication. So the planner’s toolbox included not only the types of analytical methods to be found in Isard’s 1960 textbook but also a much wider range of methods, quantitative and qualitative, supporting the different stages in the (rational) plan-making process necessary to prepare a plan or keep it up to date. They are systematic, formal, and documented so that, if necessary, their results can be replicated. This broad coverage is well represented in Bracken’s (1981) planning methods textbook. Bracken

Regional Science Methods in Planning

19

identifies three main types of method: methods for urban planning and policy making; methods for urban research; and methods for urban policy analysis. It encompasses forecasting methods, simulation models, methods for generating and testing alternatives, a whole range of evaluation methods, methods for plan monitoring and review, methods for various types of impact analysis, and methods for analyzing, manipulating, and presenting geographical information. While some of these methods may be a product of planning research, it is just as likely for them to be drawn from another discipline, such as economics, sociology, operational research, quantitative geography, landscape architecture, management science, or geographical information science. This willingness to absorb methods from other disciplines may be considered as one of the strengths of planning at the time. The second point to make is that the answer to Batty’s question will vary from one field of planning to another. In transport planning, for example, there has been a long-standing commitment to the development and use of quantitative modeling suites, linked to the need to make forecasts of travel demand and to consider how best to cater for these future needs (Boyce and Williams 2015). Techniques have continued to be refined, often by drawing upon research from allied fields such as operational research, applied mathematics, and quantitative geography. Transport modeling has always been a specialized activity, largely in the hands of teams of consultants augmented by planners and engineers recruited for particular projects and plan-making exercises. The experience of local authorities making statutory strategic land use plans is somewhat different. Thirty or so years ago, a local planning authority would probably have had a small team capable of carrying out technical tasks in-house. In larger local authorities, the team would contain specialists with expertise in different aspects of the plan they were preparing and of the range of analytical methods available, population, housing, transport, economic activity, retailing, etc. Nowadays it is much more likely that a firm of consultants would be commissioned to do this work on behalf of the local authority. This applies especially to one of the most contentious aspects of plan-making, the forecasting of housing needs. This task would probably be carried out by firms of consultants that have developed their own modeling packages for this purpose. As Fig. 2 shows, consultants now offer a comprehensive range of products, each aimed at meeting specific technical requirements, for example, FEMAplan: Defining Functional Market Areas and Integrate: Assessing Development Needs through an Integrated Evidence Base. The consultant planning analyst will probably be performing a similar set of tasks for a number of local authorities. The onus is therefore on each local authority to ensure that the commissioning of technical work is done well and meets its specific requirements. This could prove problematical if the local authority does not have planners with at least some technical knowledge and the ability to think in methodological terms. A third case is drawn from an actual regional planning situation and illustrates the intelligent use of regional science methods within the policy-making process. It concerns a regional development agency that took a decision in the early 2000s to produce short- and long-term economic forecasts with the twin purpose of informing the regional business community and supporting its own economic development

20

P. Batey

Fig. 2 A consultant’s prospectus 2019. (Source: https://lichfields.uk/content/?services=gis-spa tial-analytics)

strategy. The starting point here was a regional econometric model, developed by a specialist consulting firm that was able to draw upon the latest technical developments as a result of its close links with university researchers. The development agency did not have the means to develop its own forecasting model but instead chose what it considered to be the best available from among several available models. A second consulting firm was employed to manage the forecasting process, providing a preliminary assessment of the model forecasts which was then placed before a regional economic forecasting panel of senior business executives who were able to add value to the initial forecasts as a result of their detailed understanding of their own sector and the regional economy as a whole. Here the regional science method – the econometric model – was used in conjunction with professional knowledge and expertise in order to provide one of several inputs to the development agency’s policy-making process. These three examples confirm that regional science methods remain a part of the planner’s tool kit, although nowadays with a more mature appreciation of what methods can and cannot do. The expertise to develop and apply these methods is largely vested in consultants that have produced what can only be described as a “product range.” Suites of models are available for most of the analytical tasks planners need to perform, a huge step forward but bringing with it the obvious risk that by adopting standard methods, there is little incentive to innovate. This is a pity given the opportunities presented by big data and the massive increases in computing power.

5.6

Methods of Regional Analysis Revised: A Missed Opportunity?

In the early days of regional science, the textbook Methods of Regional Analysis offered a good guide that covered a large proportion of the methods then available.

Regional Science Methods in Planning

21

Over time, inevitably, the book became obsolete. Recognizing this, in the 1990s Isard assembled a team of regional scientists to produce a new edition. After a writing process that took several years, eventually the new book was published in 1998 (Isard et al. 1998). However, unlike its predecessor, which had a wide readership, the new book was aimed specifically at graduate students. By this time, the field had grown to the extent that it was impossible to encapsulate regional science methods in a single volume, in enough depth to provide the foundation for advanced research. Nearly 40 years after the first edition, much had changed, notably in the development of econometrics and spatial microsimulation, and both of these topics are given detailed treatment in the new volume. And while there is some mention of GIS, it hardly begins to break the surface of the subject. The title of the book, Methods of Interregional and Regional Analysis, now reflects a much fuller coverage of interregional analysis, an important distinguishing characteristic of regional science. Disappointingly little is said in the new book about the methods derived from disciplines that contribute to regional science, other than economics. This is particularly so for planning which is almost entirely ignored. Fortunately, alternative textbooks are available and in recent years, a planner seeking a general treatment of quantitative planning methods would have the option of consulting Wang and vom Hofe’s 2007 book (Wang and vom Hofe 2007). This too is disappointing, however, in that the coverage of methods sticks very closely to those that have been available since the very early days of regional science. The second edition of Isard’s book is symptomatic of a widening gap between regional science and the needs of what has been termed “regional practice” in planning and economic development. Rodney Jensen took this as the main theme of his RSA Presidential Address, delivered in 1990 (Jensen 1991), complaining that many regional scientists were self-indulgent, insular academics who had largely shut themselves off from the real world of policy-making and professional practice, with the result that they had a poor grasp of research priorities and how these could and should be met. Unless they did more to address this problem, their work was in danger of becoming irrelevant.

6

Conclusions

This chapter has shown that planners were very much to the fore in the early discussions about regional science. Planning was seen as a valid and worthwhile destination for much of the analytical work carried out by regional scientists. A subfield of applied regional science developed, focusing on the one hand on policyfocused work connected with particular regional problems, and on the other hand, methods-focused work intended to increase analytical understanding of cities and regions in ways that would ultimately aid practical plan- and policy-making. Nowadays, while there continues to be a thriving research interest in matters to do with policy, methods-based work directed at planning practitioners has all but disappeared from regional science calls for papers and conference programs. Instead, interest in methods has moved elsewhere and is nowadays much more likely to be

22

P. Batey

found in conferences organized for planning educators and researchers. The AESOP (Association of European Schools of Planning) Congress in Europe, for example, regularly has a track devoted entirely to “Methods.” In inviting papers for the 2018 meeting in Gothenburg, Sweden, for example, the Methods “Track” places particular emphasis upon harnessing new technology to support plan-making. Keywords in the call for papers specify: analytical tools, planning support systems, decision support systems, smart approaches, ICT, open source data, mobile apps, social media, generic tools, public participation GIS, visualization methods, rationality, data, and information. Arguably, these are all topics that would benefit from the kind of interdisciplinary dialogue that the RSA was created to encourage. The various camps have got much to learn from one another. An excellent opportunity to do this is presented by the recent publication of a new planning methods textbook, Planning Support Methods. Written by Klosterman and his co-authors (Klosterman et al. 2018), it is claimed to be only the third planning methods textbook to be published in the last 30 years. It replaces one of these texts, Klosterman’s Community Analysis and Planning Techniques (Klosterman 1990). The new book is aimed at undergraduates and at planning practitioners, retaining and expanding the demographic and economic analytical and projection methods found in Klosterman’s earlier book, while at the same time adding substantial new chapters on spatial analysis and land suitability analysis. The title’s reference to “planning support methods” makes it clear that here are practical methods, each of which has a specific function within the plan-making process. The book uses a real-life city and county, De Kalb and Decatur, Georgia, throughout to illustrate the methods that are being described, and there are appendices that provide a guide to American data sources and are intended to encourage students to test the methods themselves. Klosterman developed a collection of Excel workbooks that implement the methods in the new book, available at https:// planningsupport.org/tools/. Regional scientists can, and should, learn from the approach adopted in this book, carrying the message that it is very often better for planners to seek to use simple, understandable, and easy-to-use methods than to aim for ever-greater levels of sophistication. There are clear signs, however, of greater recognition of the need to make the most sophisticated regional science models user-friendly, in Jensen’s words, bringing regional science and regional practice closer together. A good example is that of urban models where over the years some of the most important advances in method have occurred. Whereas in the past such models would have been severely limited in their level of spatial and topical detail, nowadays there is scope, as a result of dramatically increased computing power, to develop extremely detailed models covering an entire country. The so-called QUANT Model, currently being developed by Michael Batty and colleagues at CASA, University College London, provides a foretaste of what urban models may be capable of in future: a means of testing largescale scenarios relating to population, economic development, and new transport schemes. Despite its size and complexity, the QUANT model is designed to be much more user-friendly than any previous urban model. In practice this means a user interface that can be accessed from any place where there is an internet connection

Regional Science Methods in Planning

23

and advanced visualization techniques that make the model’s results known in real time. This model, and others like it, promises to open up exciting new opportunities for nonspecialist planners to engage with even the most advanced regional science methods.

7

Cross-References

▶ A Reconnaissance Through the History of Shift-Share Analysis ▶ Regional Science Methods in Business ▶ Worlds Lost and Found: Regional Science Contributions in Support of Collective Decision-Making for Collective Action

References Batey PWJ (2018) The history of planning methodology. In: Hein C (ed) The Routledge handbook of planning history. Routledge, London, pp 46–59 Batty M (1982) Planning systems and systems planning. Built Environ 8(4):252–257 Boyce D, Williams H (2015) Forecasting urban travel: past, present and future. Edward Elgar, Cheltenham Bracken I (1981) Urban planning methods: research and policy analysis. Methuen, London Czamanski S (1972) Regional science techniques in practice. Lexington, Heath Fagin H (1963) The Penn-Jersey transportation study: the launching of a permanent regional planning process. J Am Inst Plann 29(1):9–18 Feld MM (2011) Martin Meyerson: building the middle range bridge to educating professional planners: an appreciation and reminiscence. J Plan Hist 10(3):239–248 Haggett P (1965) Locational analysis in human geography. Arnold, London Herbert JD, Stevens BH (1960) A model for the distribution of residential activity in urban areas. J Reg Sci 2(1):21–36 Isard W (1956) Location and space economy. MIT Press, Cambridge Isard W (1979) Notes on the origins, development and future of regional science. Papers Reg Sci Assoc 43:9–22 Isard W (2003) History of regional science and the regional science association international: the beginnings and early history. Springer, Berlin Isard W, Langford TW (1971) Regional input-output study: recollections, reflections and diverse notes on the Philadelphia experience. MIT Press, Cambridge Isard W et al (1960) Methods of regional analysis: an introduction to regional science. MIT Press, Cambridge Isard W et al (1998) Methods of interregional and regional analysis. Ashgate, Aldershot Isserman AM (2009) Book review essayist: Andrew M Isserman. J Am Plan Assoc 75(2):265–266 Jensen RC (1991) Presidential Address: Quo vadis regional science. Papers Reg Sci 70(2):97–111 Klosterman RE (1990) Community analysis and planning techniques. Rowman and Littlefield, Savage Klosterman RE, Brooks K, Drucker J, Feser E, Renski H (2018) Planning support methods: urban and regional analysis and projection. Rowman and Littlefield, Lanham McLoughlin JB (1969) Urban and regional planning: a systems approach. Faber, London Meyerson M, Banfield EC (1955) Politics and the public interest. The Free Press, Glencoe

24

P. Batey

Mitchell RB (1959) Metropolitan planning for land use and transportation. Office of Public Woks Planning, White House, Washington DC Mitchell RB (1961) On paradigms and paradiddles: a city planner looks at regional science. Papers Reg Sci Assoc 7:7–15 Mitchell RB, Rapkin C (1954) Urban traffic: a function of land use. Institute for Urban Land Use and Housing Studies, Columbia University, New York Penn-Jersey Transportation Study (1959) Prospectus, Philadelphia: Penn-Jersey Transportation Study Rodwin L (1953) The Achilles heel of British town planning. Town Plan Rev 24:23–26 Wang X, vom Hofe R (2007) Research methods in urban and regional planning. Springer, Berlin

A Reconnaissance Through the History of Shift-Share Analysis Michael L. Lahr and João Pedro Ferreira

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Shift-Share Early Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Shift-Share Analysis as a Forecasting Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Finding Meaning in the Drift Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Limits of Time and Dynamic Shift-Share Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 The Nation Need Not Be the Referent Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 The Problem of Productivity Differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Digging Deeper into Industry Shifts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Beyond Traditional Shift-Share Analysis: Structural Decomposition Analysis . . . . . . . . . . . 10 Shift-Share in a Regression Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Application to Other Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26 26 29 30 31 33 34 34 35 36 36 37 37 37

Abstract

Some research methods come and go, and others persist through time. Despite lack of apparent theory supporting shift-share analysis, we find that it is still alive and well – in fact it is perhaps in greater use now than ever before. This chapter takes the reader on a reconnaissance of several themes in the extant literature on shift-share. We start with its earliest history in the Barlow Report. We then scan researchers’ hopes for it as a forecasting tool and their attempts to draw explanatory power from it. We identify who first expanded shift-share’s temporal and spatial dimensions. We also examine extensions that cover spatial and M. L. Lahr (*) · J. P. Ferreira Edward J Bloustein School of Planning and Public Policy, Rutgers, the State University of New Jersey, New Brunswick, NJ, USA e-mail: [email protected]; [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_134

25

26

M. L. Lahr and J. P. Ferreira

intersectoral spillovers. Our story winds up by pointing out analytical equivalents in regression and input-output analysis techniques. We hope this leads to possible avenues in which researchers may wish to direct its future development. Keywords

Shift-share · On analysis · Regional growth · Regional analysis

1

Introduction

Shift-share analysis (SSA) derives from the perspective that a region “develops because the nation, of which it is a part, develops” (Isard 1960, p. 545). The method’s long-run success undoubtedly stems from “the fact that the statistical information required is very elementary and the analytical possibilities that it offers are quite large” (Esteban-Marquillas 1972, p. 249). But this analytical tool continues to be taught and used, despite a large number of detractors over time. That is, it still plays an important role in regional planning and analysis. Fortunately, SSA has a rich history often grounded in fascinating debates about the mathematics of its components and the economic meaning of them. Early on, some of these debates got rather intense, occasionally resulting in new proposals and extensions to the approach. We detail the most important pieces herein. Even though some of the approaches presented were wrongheaded and eventually abandoned, SSA’s ready application undoubtedly engaged a wide range of users and spectators. Most of this occurred when both collecting and collating data and then applying analytical tools to them were far less easy than it is today. Nowadays SSA is adopted within much more sophisticated frameworks. At the same time, tools inspired by shift-share, like structural decomposition analysis, have largely increased popularity of the general approach due to the growing availability of data. This is no full-blown compendium that reviews all works with shift-share content. Such a review would be overly lengthy and nigh unto impossible since at the date of writing this article a single original piece alone (Dunn 1960) has been cited by nearly 800 published articles. Instead, we present a retrospective reconnaissance through the salient literature, limiting the number of equations and where possible using plain language to discuss the various historical contributions. Those seeking a more detailed mathematical treatment may wish to turn to overviews by, e.g., Stevens and Moore (1980) and Mulligan and Molin (2004).

2

Shift-Share Early Developments

Shift-share analysis (SSA) began as a rough way to identify possible components of change in a regional setting. Ray, Hall, and O’Donoghue (2019) note that SSA seems to have been first used in the so-called Barlow Report. They tell us the analytical

A Reconnaissance Through the History of Shift-Share Analysis

27

work in this foundational document to British urban and regional planning was undertaken by G.D.A. (Sir Donald) MacDougall, although ascribed to Jones (1940). In it, MacDougall discussed the “structural effect” of overall industrial growth, which shows the degree to which growth in various areas of Great Britain could be postulated from the area’s industry mix and national growth rates of those industries (see also MacDougall 1940). Nonetheless, Daniel Creamer (1943, p. 85) appears to have been the stimulus for academic work after publication of his description the “measurement, industry by industry, of the divergence of the trend of employment in individual States from of the Nation as a whole.” He calls a “locational shift” an event that “occurs when changes in the employment in a given industry in a particular State differs in degree from the change in the same industry on a national basis” (Creamer 1943, p. 87). Regional analysts use this concept today. While the above contributions were devoid of equations, later early work further detailed the approach. The main SSA equation includes variables (or components) that proximate explanations of how the employment evolves over a specified period for a specific industry in a specific region. That is, it is an attempt to explain employment of industry i in region R and in the period t (etiR ) via the region’s “national share” of employment over time, the region’s divergence from the national “industry mix” over time – the “regional or locational shift” – and the rest is the specialized drift for the region in industry i. etiR ¼ NS þ IS þ e

ð1Þ

“National share” (NS) in Eq. (2) measures employment in region R as if it had changed at the same pace as total national employment. Here, e0• • and et• • are national employment at times 0 and t, respectively, and e0• R is region R’s total employment at time 0.  t  e NS ¼ e0• R 0• • ð2Þ e•• Equation (3) represents the “industry shift” (IS) of the national share. That is, it adjusts for industry-specific growth at the national level but respects region R’s difference from the nation with respect to industry mix (industry shares). The industry differences in national growth from overall national employment growth are therefore identified via this component. ! t t e e IS ¼ e0iR i0•  0• • ð3Þ ei • e • • A positive IS identifies that particular national industry prospered compared to the national average. A negative-valued IS means that industry i failed to keep pace with other industries nationwide. Finally, Eq. (4) shows the industry-specific regional drift of i in R. It measures how much R’s employment in industry i differs from expectations derived from national growth. Dunn (1960) called it the “differential shift” and later the region’s

28

M. L. Lahr and J. P. Ferreira

Table 1 Traditional representation of SSA

Industry 1 Industry 2 Industry 3 TOTAL Industry 1 Industry 2 Industry 3

0 R 400 200 75 675 NS 24.0 12.0 4.5

Nation 800 500 700 2000 IS

t R 450 175 70 695 ε

Nation 920 400 800 2120 etiR  e0iR

36.0 52.0 6.2

10.0 15.0 15.7

50.0 25.0 5.0

Δ R 50 25 5 20

Nation 120 100 100 120

% R 13% 13% 7% 3%

Nation 15% 20% 14% 6%

“competitiveness effect” since it shows how the region’s industry gained or lost share vis-à-vis the nation’s. e¼

e0iR

etiR eti •  e0iR e0i •

! ð4Þ

Table 1 exemplifies SSA using highly aggregated data for a hypothetical region. It shows that industry 1 and industry 3 have negative regional drift components (ε), while industry 2’s is positive. The implications of these components, and the extent to which local policy can affect changes, in it have been the subject of some debate among planners and regional scientists. We see from Table 1 that industries 1 and  3 grew over period t etiR  e0iR > 0 , but their regional drifts (ε) were negative. So, despite growing, these two industries did not grow as fast as national trends suggested. They essentially lost market share to other states; i.e., they lost some of the competitive advantage that they had within the state. Industry 2 displays a somewhat opposite trend. Its presence declined within the state (etiR  e0iR ¼ 25), but its regional drift is positive (ε = 15). That is, despite losing jobs, industry 2 did not hemorrhage industry 2 jobs as rapidly as the nation did; this region gained market share from other states, albeit in an industry that the nation itself may be losing markets share at an international scale. The above is the traditional version. And there are rather simple variants of it. Unlike the location quotient (LQ) , the base areal unit against which the local region is compared need only include the local region as a subcomponent. That is, while we have suggested it is the nation above, it could be a state, province, metropolitan area, or some other meta-region of which the target region is a sub-portion. In addition, there is no compelling reason to use employment by industry as the target regional measure. Very early on Fuchs (1959, p. 4), for example, examined the evolution of US value added by industry instead of employment, as “it reflects the participation of men and machines in the transformation of matter and the creation of utility.”

A Reconnaissance Through the History of Shift-Share Analysis

29

Immediate subsequent methodological developments consisted of identifying an appropriate start date for the comparisons, including analyses of the possible differences that arise from choosing different time intervals. Indeed, Fuchs (1959) suggested using an intermediate date as a basis for comparison. Dunn (1960) noted the choice of time interval can obscure important distortions. Dunn (1960) suggested it could help analysts to split industry-based findings into inward shifts (those that are positive) and outward shifts (those that are negative). He contended that this approach helps to better understand the target region’s relative competitiveness. Dunn (1960) noted that, as in the case of selecting the time interval for the analysis, the level of aggregation selected for industry analysis can obscure heterogeneity. Clearly, SSA was gaining traction, despite some debate about its promise as a research tool. Ready application and accessibility were in its favor. It seemed to some to hold promise as a better way than simple linear projection to get an idea of future regional industrial performance. This was despite the fact that Ashby (1968) noted SSA simply compares the performances of regions’ economic growth as contrasted against a common reference set. Nevertheless, researchers seemed to agree that if shift-share was not a forecasting tool at this point, it inevitably would be. So, primordial SSA existed at the start of the 1960s. National share, industry shifts and regional drift could now be combined to yield better understanding about the evolution of an industry located in a given region of a certain country. But could SSA become the forecasting tool the regional science community was seeking? The jury was out debating this issue during the following decade; it contributed to a number of improvements in the SSA.

3

Shift-Share Analysis as a Forecasting Tool

Work in the late 1960s and early 1970s pushed SSA toward becoming a more accurate and popular forecasting tool. Stilwell (1969) focused on the industry mix component, identifying it as the main proximate cause of industrial growth in most of the regions in his work. That is, regions with a positive change in their employment tended to have employment concentrated in industries that grew rapidly nationally. Brown (1971) confirmed this finding and further asserted that the regional drift component suffers great instability, disabling analysts’ abilities to predict it. In this vein, Brown invalidated SSAs as a forecasting tool by concluding that it generated poor forecasts. Paraskevopoulos (1971), however, noted Brown (1971) violated Dunn’s (1960) point about identifying the proper period to which SSA ought to be applied, so that his words could not be deemed final on the matter. James and Hughes (1973) contrasted SSA findings against those of a constant share model. They suggested that regional drift should be expected to decelerate, even become negative, as a region becomes specialized to the extreme in a given industry. It is not clear whether this expectation was due to input and labor shortages or to sudden declines in returns form agglomeration economies. Regardless, they

30

M. L. Lahr and J. P. Ferreira

note that positive regional drifts are unsustainable when they are persistent and become ever higher for more than a couple of decades. The implications of this are that, theoretically, SSA could be a reasonable short-run forecasting tool most of the time in most regions. But it is unlikely to work well where the degree of industry specialization is very high or where a given industry is practically absent. Borts and Stein (1964) had come to similar conclusions simply based on location economic theoretic grounds. They noted that positive regional drift is based on specific locational advantages, which fade as prices of land, labor, capital, or other inputs start to increase. In other words, a region’s degree of industry specialization is never infinitely elastic. Finally, in their review of SSA work, Stevens and Moore (1980) conclude that “it is hard to justify either a constant shift or a constant share assumption on theoretical grounds.” This summary judgment and the emergence of time series econometric forecasts ultimately caused researchers to abandon the path to make shift-share a forecasting tool.

4

Finding Meaning in the Drift Component

SSA’s simplicity and the insight it provided kept it in the limelight insofar as planners and regional science practitioners were concerned. Its use continued with hopeful advances in its formulation and form. This work often proved to be worthwhile. In part this is because Stevens and Moore’s (1980) review enabled a better understanding of shift-share fundamentals, and in so doing it identified some clear avenues for future work. One avenue was that much more had to be done if SSA could ever be a forecasting tool. Another was that some statistical assessments should be done with respect to the relation between regional drift and some set of locational socioeconomic and demographic variables. Their hope was that it might ground SSA in the theories of regional growth and development (e.g., agglomeration forces and market orientation). But a main Stevens and Moore point was that improvements to SSA should be possible and welcomed. Taking this as a cue, authors started to focus on the proper set of independent variables to be included in an SSA equation. A foundational critique had been formulated some years before by Rosenfeld (1959) who postulated that regional drift depended on the region’s industry mix. In essence, Rosenfeld stated that a given share of national growth of a specific industry in two different regions should result in a different regional drift due to the different industrial endowments at the outset of the period of analysis. This was a profound thought at the time that was largely ignored, perhaps because it was only accessed by Francophones. It suggested that no one had yet properly identified what the regional drift component was in socioeconomic terms. Ultimately Rosenfeld, perhaps unknowingly, was suggesting if the regional drift of a given sector in fact displays any special dynamism, that dynamism is influenced by the magnitude of that industry’s connections with other industries within the region.

A Reconnaissance Through the History of Shift-Share Analysis

31

We use Table 2 to demonstrate Rosenfeld’s argument. In both region A and region B, industry 1 increases by 25% between time 0 and t. The two regions have the same total employment (1000 jobs). The national change in employment of industry 1 affects both regions equally. But regional data reveal that the regional drift is higher in B than in A. The only thing that is different between both regions is the greater concentration of employment in industry 1 (and, of course, less concentration in all other industries). So, the regional drift either must measure some sort of special dynamism (1) between industry 1 and the industries within a given region, (2) just among the other industries, or (3) that is capricious. Esteban-Marquillas (1972) suggested controlling for base-period homothetic employment (HE) – the number of jobs in a given region if the region had the nation’s industry mix – hoping to distill something meaningful from the regional drift component. HE ij ¼

e0• j



e0• i e0• •



Stokes (1974) discovered Esteban-Marquillas’s decomposition lacked aggregation-disaggregation symmetry, i.e., its shift-share components of all subregions failed to sum to that of the meta-region. This sort of symmetry guarantees that SSA results do not depend on the level of spatial or sectoral aggregation that is applied. Extolling its merits, Herzog and Olsen (1977) believed homothetic employment could ameliorate somewhat the problem of measuring inter-industry effects that are implicit in the regional drift component. They suspected that EstebanMarquillas’s (1972) allocation effect improperly weighted the industry mix, a problem identified as early as Fuchs (1959). Akin to Laspeyres and Paasche indices, it appears SSA is extremely sensitive to whether industry mix is measured at the beginning or the end of the study period. Arcelus (1984, p. 6) subsequently pointed out that “growing regions are expected to affect the employment levels of the industries in their midst in ways different than stagnant or backward regions do.” So by using HE, Arcellus similarly decomposed the regional drift component that at least satisfied the property of aggregation-disaggregation symmetry. But Arcellus apparently had trouble deciphering his decomposition. Ultimately, researchers, c.f. Loveridge and Selting (1998), identified empirical shortcomings of the homothetic concept, so it was subsequently abandoned.

5

Limits of Time and Dynamic Shift-Share Analysis

Perhaps some of the most meaningful improvements pertain to the time dimension of SSA. Thirlwall (1967) suggested subdivision of SSA into two or more sub-periods across a pre-specified time interval. For example, using the national business cycle, it might be worthwhile to follow trends from trough to peak and from peak to trough. Of course, regions often have business cycles that are distinct from the nation’s

Industry 1

Industry 1 Total

0 Region A 200 1000 National share Region A 14.0

Region B 35.0

Region B 500 1000

Nation 1200 2500 Industry mix Region A 15.2

Table 2 Example of a type of Rosenfeld critique to the SSA

Region B 37.9

t Region A 250 1050 Region B 625 1125 Regional shift Region A 20.8 Region B 52.1

Nation 1375 2675

% Region A 25% 5%

Region B 25% 13%

Nation 15% 7%

32 M. L. Lahr and J. P. Ferreira

A Reconnaissance Through the History of Shift-Share Analysis

33

(or other meta-region), so those also could be analyzed separately. Perhaps as a consequence of what seemed to be almost senseless slicing and dicing of time, Barff and Knight III (1988) suggested estimating SSA on an annual basis. So called dynamic shift-share readily enables unusual pairings of years and economic transitions. Shi and Yang (2008) suggest that this dynamic approach is particularly well suited for longer study timeframes, especially when dealing with small regions or in the presence of large shifts in local industry mix. For example, Markusen et al. (1991) use this approach to its fullest to track the sensitivity of regional growth to international commodity flows. That is, they decompose value-added changes into import, export, and domestic market segments and a productivity component.

6

The Nation Need Not Be the Referent Area

Sihag and McDonough (1989) extended the meta-region from the nation to the world, arguing that magnitude of international trade demanded study at a global scale. While not necessarily embracing those thoughts fully, Loveridge and Selting (1998) noted that the same model could be applied to compare a region’s performance with intermediate levels of spatial disaggregation, for example, at the census region, state, or county level if the target area is a municipality. Indeed, if one is working on a region that is a county or smaller areal level, a state or provincial comparison probably should be applied in addition to a national or global perspective. After all, policy tools are effected at that level of governance as well, and presumably planning agencies at the local level have some voice in policy actions implemented by their meta-regions. In the end, the most appropriate referent region depends on the research focus. Through experience with spatial statistics and spatial econometrics via the likes of Cliff and Ord (1981), Dinc et al. (1998a) recognized that spatial dependencies are undoubtedly inherent to the regional drift components. Nazara and Hewings (2004) understood that SSA’s regional drift is unlikely to be spatially independent. That is, the structure and performance of surrounding regions undoubtedly influence the evolution of a given industry in a particular region. Nazara and Hewings therefore use a spatially weighted version of characteristics of the region’s neighbors as a proxy referent region. Rather than measure the national industry shift effect, they measure the difference between growth rates of a spatially weighted industries of immediately contiguous region’s and those of the same industries for the nation; and a new/different regional drift component is then generated from the difference between the same spatially weighted neighbors measure and the local industry’s growth. Many works followed using similar approaches with different sorts of spatial weights matrices (queen contiguity, rook contiguity, distance, travel time, etc.) as elaborated in prior statistical work by Klaassen and Paelinck (1972). Cheng (2011) notes that the regional drift components of such spatially dependent SSA can vary considerably depending upon the spatial perspective taken. In essence, the nature of the referent region and, in the case of spatial dependence, the spatial weights matrix used can make quite a substantive difference in SSA findings for a

34

M. L. Lahr and J. P. Ferreira

specific region. Of course, some degree of policy relevance to the target region should attach to the referent region as well.

7

The Problem of Productivity Differentials

By focusing on employment change, traditional SSA unfortunately facilitates the ability of analysts to confound output growth with productivity improvements (Rigby and Anderson 1993). That is, while one might observe that a region’s employment is declining faster than the nation’s in an industry, it may be that its production levels (i.e., its level of output or net taxable income) are growing faster. This can occur if the industry’s productivity within the region is improving faster than the national equivalent. Thus, by not accounting for interregional differences in labor and capital productivity, analysts will make false inferences about regional growth by focusing strictly upon employment change. Unfortunately, remedies to this condition compromise the valued uncomplicated nature of SSA.

8

Digging Deeper into Industry Shifts

Tracking the effects of industry change had always been a core element of SSA. But it also was understood by its originators (Rosenfeld 1959, in particular) that they were not grasping a key element of regional industry content – the magnitude of intraregional sectoral interactions, i.e., the extent of Hirschman’s (1958) linkages within the target region. Hirschman proselytized the concept of further developing backward and forward linkages within an economy as a means of overcoming unbalanced growth. He noted that forward linkages were created when a particular project or program encourages investment in subsequent related streams of production within an economy. Backward linkages were created when a project or program encourages investment in the supply chain within the economy that supports that project or program. Hirschman extoled that investments should be made in those projects and programs that have the largest total linkages (forward plus backward). He further identified how linkages might be measured using input-output tables. Ramajo and Márquez (2008) extend SSA to account for Hirschman’s linkages. Accordingly, they estimate change in value added from a new sectoral shift-share, one in which linkages are somehow related with the industry mix in the analysis. The authors deem a binary configuration of weights would indicate only whether the sector exists or not in the region. Thus, such an approach would ignore both the size of a given industry and the relative size of its linkages to the rest of the economy. The authors suggest a viable notion of sectoral interaction in SSA must identify the intensity of an industry’s relationship to other sectors in the target economic system. They therefore offer that sectoral interaction be defined as the set of flows among industries transmitted through the economic system. They thus developed sectoral neighborhood analysis akin to that used by Nazara and Hewings (2004), but, rather than using spatial weights, they use sectoral weights from the direct requirements

A Reconnaissance Through the History of Shift-Share Analysis

35

matrix in Leontief’s (1936) model for backward linkages and the direct allocation matrix in Ghosh’s (1958) model. They thereby apply two new components to SSA: (1) a sectoral structural effect that is attributable to Leontief weights of sectoral composition and (2) the industry distributional effect that is attributable to Ghosh weights of an industry’s sectoral disposition of its product. Using this same pair of measures in an econometric analysis of structural change on labor productivity in EU, O’Leary and Webber (2015) show that changes in intersectoral structural change appear to affect regional productivity growth and convergence.

9

Beyond Traditional Shift-Share Analysis: Structural Decomposition Analysis

Following a summary of the Ramajo-Márquez extension of SSA, it only seems natural to provide a discussion of the broader relationship between input-output (I-O) analysis and SSA. In fact, measuring changes in industry technology and, hence, changes in their mix within an economy was an original goal of Wassily Leontief, who first developed I-O analysis. He recognized early on that the gap between I-O tables of different vintages for an economy identifies structural change, once price changes are netted out of them (Leontief 1941). But the subject was not examined via a unified approach until researchers applied RAS and structural decomposition approaches to the issue. RAS is the name currently given to the bi-proportional adjustment technique that became accessible to social scientists via Deming and Stephan (1940) and popularized within I-O analysis through various publications, most recently Miller and Blair (2009, Sect. 7.4). At least in early versions of the algorithm, this data-ranking technique required an initial, prior matrix and known updates to the sums of its rows and columns, which are referred to as “margin totals.” Naturally the prior matrix does not match with the updated margins, so RAS is an iterative procedure. It forces first the rows and subsequently the columns of the prior matrix to be changed using proportions that relate its margins (and margins of its re-estimates after the proportions are successively updated) to the known margin updates. Miller and Blair (2009) summarize this process, noting that At ¼ brA0bs where A0 and At are the prior and updated technology I-O direct requirements matrices; r is a vector that is the product of the successively applied row-wise adjustments; s is a vector, which is the product of the successively applied column-wise adjustments; and the circumflex or hat (^) denotes that the vector is transformed into a diagonal matrix that has the vector on its diagonal elements and zeros elsewhere. Stone (1961) suggested that r reveals substitution effects across inputs and that s reveals fabrication effects, which not coincidentally seem related to shifts and shares, respectively. Ultimately, De Mesnard (1990) emphasized the economic interpretation of RAS by recognizing the gap between the starting and ending point of the iterative process and associate them with structural changes.

36

M. L. Lahr and J. P. Ferreira

Interestingly, van der Linden and Dietzenbacher (2000) suggest that some elementspecific third components undoubtedly muddy perfect economic interpretations of r and s. This recalls early issues attached to the regional drift component. Indeed, by using a form of RAS, Oosterhaven and Escobedo-Cardeñoso (2011) demonstrated that regional I-O tables can be forecasted fairly well by applying the “remainder” component, assuming it is knowable with some time lag. Skolka (1989) first popularized an identify-splitting approach for identifying sources of change across I-O tables, i.e., something that could be called “structural decomposition analysis” (SDA). Rose and Casler (1996) conjectured that SDA was I-O’s equivalent of SSA. In retrospect, indeed, SDA enjoys many similarities with SSA. Both focus on causes of economic change via some sort of changes in industry shares. While SSA focuses its shares via differences over space, SDA works its shares across industries via technology change (fabrication effects). SDA is clearly much more demanding in terms of time and data requirements than is SSA; on the other hand, it can produce far richer and diverse results as well. With the above in mind, Lahr and Dietzenbacher (2017) merged SDA with SSA. They describe their approach using a seven-component SDA: (1) national labor use; (2) national labor multipliers; (3) the supplying region’s own share of the regional final demand; (4) differences between regional and national final demand mixes; (5) national final demand mix; (6) shares of regional total final demand in the national final demand; and (7) total national final demand. The key element from SDA that is missing is anything pertaining to the region’s set of industry-specific direct requirements; in the decomposition, industry-specific requirements are harnessed only via the national labor multipliers. All remaining factors have nation-region contrasts akin to SSA decomposition components.

10

Shift-Share in a Regression Framework

A core concern with both SSA and SDA is that they can at best indicate only proximate causation and yield no statistical significance with respect to the relationships they might reveal. A regression framework, however, enables statistical inference. The first statistical analysis using SSA was Weeden’s (1974). This was followed by a series of papers in the mid-to-late 1970s culminating in Treyz and Stevens (1979) who examined the causes of the regional drift component more deeply. This ultimately lead to Patterson’s (1991) work – a full-analog regression model of shift-share via a constrained weighted least-squares regression approach. More-recent investigations into shift-share regression was summarized and extended by Di Berardino et al. (2017).

11

Application to Other Topics

SSA was proposed to enable an understanding the dispersion of employment in Great Britain (the Barlow Report) and the USA (Perloff et al. 1960). It has since moved far beyond that to examine migration (e.g., Wright and Ellis 1996), racial, gender, and other demographic changes (e.g., Franklin and Plane 2004), firm growth

A Reconnaissance Through the History of Shift-Share Analysis

37

(e.g., Johnson 2004), telecommunications service policy (Dinc et al. 1998b), tourism (e.g., Firgo and Fritz 2017), regional electricity consumption change (Grossi and Mussini 2018), and even something as particular as global rice export market shares (Lakkakula et al. 2015). This listing is by no means exhaustive. Yet it should be clear that SSA’s flexibility and accessibility in application deserves to be in every regional analyst’s tool bag.

12

Conclusions

Since Perloff et al. (1960), shift-share analysis (SSA) has become one of the most heavily applied approaches for attempting to gain a basic understanding of regional economies. It retains that role along with the analysis of location quotients, which are used to identify industry concentrations, despite a cacophony of sophisticated alternatives that analysts have at their disposal. As Mulligan and Molin (2004, p. 115) note “apparently, the popularity of shift-share will continue to endure despite numerous scathing attacks over the years.” But if nothing else is clear from this reconnaissance through the relevant literature, SSA is merely a practicable accounting identity: it never was meant to be a theory. From this perspective, the expectation that SSA become some comprehensive model of regional growth seems almost irrational. Regardless, it remains an excellent starting point for analyzing growth change and differences at the regional level – one that mathematically impaired policy wonks can generally understand.

13

Cross-References

▶ Interregional Trade: Models and Analyses ▶ Regional Growth and Convergence Empirics ▶ Regional Science Methods in Planning

References Arcelus F (1984) An extension of shift-share analysis. Growth Chang 15(1):3–8 Ashby L (1968) The shift and share analysis: a reply. South Econ J 34:42–25 Barff R, Knight P III (1988) Dynamic shift-share analysis. Growth Chang 19(2):1–10 Borts G, Stein J (1964) Economic growth in a free market. Columbia University Press, New York Brown H (1971) The stability of the regional share component: a reply. J Reg Sci 11:113–114 Cheng S (2011) Business cycle, industrial composition, or regional advantage? A decomposition analysis of new firm formation in the United States. Ann Reg Sci 47:147–167 Cliff A, Ord J (1981) Spatial processes: models and applications. Taylor & Francis, London Creamer D (1943) Shifts of manufacturing industries. In Florence PS, Fritz WG, Gilles RC (eds), Industrial Location and National Resources. US National Resources Planning Board, US Government Printing Office, Washington, DC, 85–104 De Mesnard L (1990) Dynamique de la structure interindustrielle française. Economica, Paris Deming W, Stephan F (1940) On a least-squares adjustment of a sampled frequency table when the expected marginal totals are known. Ann Math Stat 11:427–444

38

M. L. Lahr and J. P. Ferreira

Di Berardino C, Mauro G, Quaglione D, Sarra A (2017) Structural change and the sustainability of regional convergence: evidence from the Italian regions. Environ Plan C Politics Space 35:289–311 Dinc M, Haynes K, Qiangsheng L (1998a) A comparative evaluation of shift-share models and their extensions. Australas J Reg Stud 4:275–302 Dinc M, Haynes K, Stough R, Yilmaz S (1998b) Regional universal telecommunication service provisions in the US: efficiency versus penetration. Telecommun Policy 22:541–553 Dunn E (1960) A statistical and analytical technique for regional science. Pap Reg Sci Assoc 6:97–112 Esteban-Marquillas J (1972) A reinterpretation of shift-share analysis. Reg Urban Econ 2(3):249–255 Firgo M, Fritz O (2017) Does having the right visitor mix do the job? Applying an econometric shift-share model to regional tourism developments. Ann Reg Sci 58(3):469–490 Franklin R, Plane D (2004) A shift-share method for the analysis of regional fertility change: an application to the decline in childbearing in Italy, 1952–1991. Geogr Anal 36(1):1–20 Fuchs V (1959) Changes in the location of US manufacturing since 1929. J Reg Sci 1(2):1–18 Ghosh A (1958) Input-output approach in an allocation system. Economica 25:58–64 Grossi L, Mussini M (2018) A spatial shift-share decomposition of electricity consumption changes across Italian regions. Energy Policy 113:278–293 Herzog H, Olsen R (1977) Shift-share analysis revisited: the allocation effect and the stability of regional structure. J Reg Sci 17(3):441–454 Hischman A (1958) The strategy of economic development. Yale University Press, New Haven Isard W (1960) Methods of regional analysis: an introduction to regional science. Massachusetts Institute of Technology Press, Cambridge, MA James F Jr, Hughes J (1973) A test of shift and share analysis as a predictive device. J Reg Sci 13(2):223–231 Johnson P (2004) Differences in regional firm formation rates: a decomposition analysis, Entrepreneurship, Theory and Practice, 28(5):431–446 Jones JH (1940) Appendix II, A memorandum on the location of industry. In Barlow M (ed) Report of the royal commission on the distribution of the industrial population (Barlow Report), His Majesty’s Stationery Office, Cmnd 6135. Ministry of Housing and Local Government, London, pp 249–280 Klaassen L, Paelinck J (1972) Asymmetry in shift-and share analysis. Reg Urban Econ 2(3):256–261 Lahr M, Dietzenbacher E (2017) Structural decomposition and shift-share analyses: let the parallels converge. In: Regional research frontiers, vol 2. Springer, Cham, pp 209–220 Lakkakula P, Dixon B, Thomsen M, Wailes E, Danforth D (2015) Global rice trade competitiveness: a shift-share analysis. Agric Econ 46(5):667–676 Leontief W (1936) Quantitative input and output relations in the economic system of the United States. Rev Econ Stat 18(3):105–125 Leontief W (1941) Structure of American economy. Oxford University Press, New York Loveridge S, Selting AC (1998) A review and comparison of shift-share identities. Int Reg Sci Rev 21(1):37–58 MacDougall GDA (1940) Inter-war population changes in town and country. J R Stat Soc 103(1):30–60 Markusen A, Noponen H, Driessen K (1991) International trade, productivity, and US regional job growth: a shift-share interpretation. Int Reg Sci Rev 14(1):15–39 Miller RE, Blair PD (2009) Input-output analysis: foundations and extensions. Cambridge University Press Mulligan G, Molin A (2004) Estimating population change with a two-category shift-share model. Ann Reg Sci 38(1):113–130 Nazara S, Hewings GJD (2004) Spatial structure and taxonomy of decomposition in shift-share analysis. Growth Chang 35(4):476–490

A Reconnaissance Through the History of Shift-Share Analysis

39

O’Leary E, Webber D (2015) The role of structural change in European regional productivity growth. Reg Stud 49(9):1548–1560 Oosterhaven J, Escobedo-Cardeñoso F (2011) A new method to estimate input-output tables by means of structural lags, tested on Spanish regions. Pap Reg Sci 90(4):829–844 Paraskevopoulos C (1971) The stability of the regional share component: an empirical test. J Reg Sci 11:107–112 Patterson M (1991) A note on the formulation of a full-analogue regression model of the shift-share method. J Reg Sci 31(2):211–216 Perloff H, Dunn E, Lampard R, Muth R (1960) Regions, resources, and economic growth. Johns Hopkins University Press, Baltimore Ramajo J, Márquez M (2008) Componentes Espaciales en el Modelo ShiftShare. Una Aplicación al Caso de las Regiones Peninsulares Españolas. Estadística Española 50:41–65 Ray DM, Hall PG, O’Donoghue DP (2019) The elusive quest for balanced regional growth from Barlow to Brexit: Lessons from partitioning regional employment growth in Great Britain. Growth Chang, 50(1):266–284 Rigby D, Anderson W (1993) Employment change, growth and productivity in Canadian manufacturing: an extension and application of shift-share analysis. Can J Reg Sci 16:69–88 Rose A, Casler S (1996) Input–output structural decomposition analysis: a critical appraisal. Econ Syst Res 8(1):33–62 Rosenfeld F (1959) Commentaire à l’exposé de M. Dunn. Econ Appl 4:531–534 Shi C, Yang Y (2008) A review of shift-share analysis and its application in tourism. Int J Manag Perspect 1(1):21–30 Sihag B, McDonough C (1989) Shift-share analysis: the international dimension. Growth Chang 20(3):80–88 Skolka J (1989) Input-output structural decomposition analysis for Austria. J Policy Model 11:45–66 Stevens B, Moore C (1980) A critical review of the literature on shift-share as a forecasting technique. J Reg Sci 20(4):419–437 Stilwell F (1969) Regional growth and structural adaptation. Urban Stud 6:162–178 Stokes H Jr (1974) Shift share once again. Reg Urban Econ 4(1):57–60 Stone R (1961) Input–Output and National Accounts. Paris, Organization for European Economic Cooperation Thirlwall A (1967) A measure of the proper distribution of industry. Oxf Econ Pap 19:46–58 Treyz G, Stevens B (1979) Location analysis for multiregional modeling paper presented at modeling the multi-region economic system; Regional Science Research Institute Discussion Paper Series, 113 van der Linden J, Dietzenbacher E (2000) The determinants of structural change in the European Union: a new application of RAS. Environ Plan A 32:2205–2229 Weeden R (1974) Regional rates of growth of employment: an analysis of variance treatment. National Institute of economic and social research, regional papers III, Cambridge University Press Wright R, Ellis M (1996) Immigrants and the changing racial/ethnic division of labor in New York City, 1970–1990. Urban Geogr 17(4):317–353

City-Size Distribution: The Evolution of Theory, Evidence, and Policy in Regional Science Gordon F. Mulligan and John I. Carruthers

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Foundations: 1935–1955 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 New Perspectives: 1955–1975 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Mature Perspectives: 1975–1995 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 The Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

42 43 43 43 46 46 47 52 52 53 56 57 57

Abstract

This chapter reviews the extensive thinking on city-size distributions between 1935 and 1995. This chapter sheds light on many issues that are of fundamental importance to regional science, but especially the nature of national development and the extent of interregional inequality. Between 1935 and 1955, the city-size literature clarified between primate and rank-size distributions (Zipf 1949) and demonstrated how a steady state could arise through Gibrat’s law of proportionate effect (Herbert Simon). Between 1955 and 1975, the city-size literature incorporated thinking from central place theory in order to provide the earliest hierarchy G. F. Mulligan School of Geography and Development, University of Arizona, Tucson, AZ, USA e-mail: [email protected] J. I. Carruthers (*) Department of City and Regional Planning, Cornell University, Ithaca, NY, USA e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_130

41

42

G. F. Mulligan and J. I. Carruthers

models (Martin Beckmann); clarified how settlement systems evolved through space and over time (Richard Morrill); connected the issue of primacy to national development (Brian Berry); and made use of the concept of hierarchy in the policy discussion of settlement systems (Gerald Hodge). The final time period, between 1975 and 1995, brought forth a maturing of these different perspectives, where primacy was found to be determined by a wide mix of social and economic factors (Ronald Moomaw), simulation models became much more complex (Roger White), and hierarchy models were articulated both with the economic base concept and with mainstream microeconomics (Carol Taylor). Keywords

Primacy · Rank-size · Gibrat’s law · Hierarchy · Settlement systems · Simulation

1

Introduction

The study of city population size in regional and national systems has a rich and lengthy history in the economic, planning, regional, and spatial sciences (see Haggett 1965 for an early entry). In recent times the literature has become highly technical, but, in earlier times, analysts of diverse backgrounds were interested in a wide range of topics related to variation in city population size. During the five decades after World War II, the period of our current interest, some of the major research questions have been: • Do the various cities of a region or nation comprise a balanced or unbalanced (primate) distribution? • Do these city sizes constitute a continuum or is there some hierarchical principle at work? • Does this distribution change as the region or nation undergoes urbanization or economic development? • Do inventions and innovations induce faster growth in some cities at the expense of others? • Do economic multiplier effects vary with city size? • How do city population sizes change as the functional specialties of cities shift over time? • Do the shifting occupational patterns of post-industrialism affect the nature of the city-size distribution? • How does enhanced international trade impact the city-size distribution? • Should governments regulate growth in entire systems of cities? • Can governments regulate such growth? • Is there some optimal distribution of city population sizes? The city-size literature addresses so many issues it would not be prudent to try summarizing this wide literature in a single review chapter. So, instead, this chapter

City-Size Distribution: The Evolution of Theory, Evidence, and Policy in. . .

43

highlights the main currents of thought during the 60 years between 1935 and 1995. In undertaking this task, it roughly divides that period into three different eras and discusses how the literature dealt with theory, evidence, and policy during each of those eras. No new analysis is conducted because the purpose here is squarely on reviewing the historical evolution of the city-size literature.

2

Foundations: 1935–1955

2.1

Background

The literature after the World War II was informed by a handful of influential books and papers that appeared in the late 1930s and early 1940s. In one of these, the geographer Mark Jefferson (1939, p. 231) observed that many countries had one dominant metropolis that was “always disproportionately large and exceptionally expressive of national capacity and feeling.” He called this the primate city – and this was a term eventually adopted by others to mean various things. Jefferson examined some 50 countries, nearly all in the West, and formed ratios between those cities ranked #1, #2, and #3 in their population sizes. He pointed out that, in 28 cases, the largest city was more than twice as large as the second city, and in 18 more it was more than three times as large as the third city. He believed that this was enough evidence to confer the status of law on his observation, but he never advanced a rule or benchmark that could be used to test for the existence of a primate distribution. So, while Austria’s 1934 distribution of 100–8-6 seems very different from Sweden’s 1937 distribution of 100–48-26, both were believed to be primate. However, Jefferson did make the interesting observation that primacy could be mediated/ undermined by various factors, including postcolonial ties, provincialism, or domination by a nearby world metropolis in a neighboring country.

2.2

The Main Contributions

2.2.1 The Rank-Size Distribution George Zipf was a linguist who analyzed the frequency of words used in various languages. He is famous for suggesting that these frequencies follow an inverse ranking rule, where the second most common word is used one-half as often as the most common word and the third most common word is used one-third as often, and so on. He believed this pattern occurred in order to minimize, somehow, the effort involved in people communicating with one another. Zipf, and others, extended this principle to various phenomena, including household incomes, corporation sizes, drainage volumes for rivers, and the like (for a recent review of applications of Zipf’s Law, see Saichev et al. 2010). While a few people before Zipf, including Felix Auerbach and Hans Singer, did note the approximate conformity of city population sizes to this rule, the popularity of Zipf’s writings led his name to become attached to the well-known relationship:

44

G. F. Mulligan and J. I. Carruthers

Pr ¼ P1 =rq which says that the population of the rth ranking city Pr equals the population of the first ranking city P1 divided by that city’s rank r raised to the power q. Estimation is now done with linear regression after making a logarithmic transformation so that: ln Pr ¼ ln P1  q ln r This is called the rank-size distribution or, more commonly, the rank-size rule, especially when the special case of slope q = 1 is addressed. The rank-size distribution became popular in regional science in part because US city sizes visually conformed to the second formula, with q approximating unity, during every decade after the first national census in 1790. Here Madden (1956) remarked that the entire system exhibited stability through time even though individual cities might ascend or descend, sometimes dramatically, the rank-size ladder. These insights led to an underappreciated paper by John Stewart (1947) who brought the perspectives of social physics and macro-geography to the city-size literature. While Stewart believed the rank-size rule reflected “underlying demographic tendencies,” he was never successful in demonstrating just what these tendencies were. But he performed analytical computations that showed how the system’s overall urban population – the sum of the populations of all cities – could be computed from a finite series conforming to the rank-size principle. This series converges when q > 1 but diverges when q = 1 or q < 1. Nevertheless, the urban population could be approximated in those last two cases when rank r was sufficiently large. In any case, Stewart provided various estimates of the rank-size distribution, both for the United States, setting q = 1, and for the world’s leading cities, where q < 1 was estimated. On the other hand, in later research he pointed out that some national city-size distributions might instead take on prolonged S-shaped properties as they evolved over time (Stewart 1958). Stewart then constructed yet another empirically driven formula to estimate the urban fraction, or urbanization ratio, based on these same city ranks. He thought that if “the rank-size rule represents an equilibrium among competing cities,” then “there is also an equilibrium between the competing attractions of rural and urban life” (Stewart 1958, p. 469). In words that anticipated Madden (see above), Stewart claimed that the US rural population had already reached a ceiling, but the nation’s urban population could expect to grow at about 1% per year for the following two decades, while individual cities could expect to grow at that same rate as long as they could maintain their current ranks.

2.2.2 Random Growth A few years later, the polymath Herbert Simon (1955) wrote an influential paper that addressed the same domain of empirical regularities as Zipf. But now the intent was to demonstrate, in statistical terms, the conditions allowing highly skewed J-shaped distributions – those with long upper tails – to appear so widely in both the human and natural realms. While he did not cite the work of Robert Gibrat (1931), his model of stochastic growth was really a variation on Gibrat’s thinking, so that earlier

City-Size Distribution: The Evolution of Theory, Evidence, and Policy in. . .

45

proposal should be discussed. Gibrat was more interested in the growth of firm sizes, and his growth model was based on three principles: (i) the causes of job growth are numerous; (ii) their instantaneous effects on job numbers are independent of existing job numbers; and (iii) these increments in job growth are all very small. In other words, the process of firm growth involves the cumulative effects of many small positive and negative job fluctuations. Another way to look at this is to distinguish between processes (variables) where random events are summed up and those where random events are multiplied together: for all practical purposes, the former distribution is normal, while the latter is lognormal in shape. This led to Gibrat’s so-called law of proportionate effect where firms, or cities, would adjust their sizes over time until, eventually, a steady state of sizes would appear and take on lognormal properties. Gibrat’s model was acknowledged later in a popular book The Lognormal Distribution, written by J. Aitchison and J. Brown (1957), that addressed income inequalities. The major difference between the Yule process advocated by Simon and the lognormal process is that the former allows firms (or cities) to enter, or depart from, the pool of existing firms (cities) over time. In other words, Simon’s perspective addresses an open system: not one that is entirely closed off to births or deaths. However, when dealing with large entities located in space, a couple of additional concerns arise in stochastic processes, and these are often overlooked even in today’s highly technical papers (Vining 1974, 1977). The first deals with the so-called threshold effect. Cities have a certain minimum size, which can vary because urban places are defined in different ways, and the rank-size rule does not usually hold all the way down to that minimum size; instead rank-size regularity typically holds, when it does, only along the upper tail of the lognormal distribution and along the upper tail of an actual city-size distribution. The second issue concerns how either boundary conditions or neighborhood (spillover) effects, both prevalent in spatial processes, might distort either Gibrat’s growth process or the possibilities of entry or exit through migration. This second issue was taken up much later by Eric Sheppard (1982), and others, but was certainly recognized in the context of settlement spacing. Positive spillovers can change the frequencies of population sizes throughout a settlement system and, under certain conditions, even shift a more balanced distribution toward a more primate one.

2.2.3 Central Place Theory The other key contribution to the city-size literature at this time was the research on central place systems undertaken by Walter Christaller (1933) and August Lösch (1938) in Germany. Although their ideas were somewhat different, in part because Christaller was a geographer and Lösch (1954) an economist, they both argued that city systems are organized according to hierarchical principles, where places on higher levels provide a greater variety of goods and services than places on lower levels. In fact, these large places occupy more central locations on the landscape, and this allows them to “export” a distinctive bundle of private and public goods to small places and to rural areas found in the vicinity. Through adoption and adaptation, an efficient nesting arrangement was believed to emerge where large places would serve extensive market areas and small places would serve restricted market areas. This

46

G. F. Mulligan and J. I. Carruthers

nesting arrangement in turn would generate a frequency distribution for the central places that occupied different levels: in Christaller’s K = 3 system, the numbers of places on lower and lower levels of the hierarchy, which in turn formed larger and larger size classes, would be 1, 2, 6, 18, 54, and so on. So, in short, the efficient use of homogenous space, combined with scale economies in production, dictates that there always will be few large cities, more medium-sized cities, and many more small cities in a settlement system. These ideas were first brought to the United States by a handful of people like Edward Ullman (1941), a geographer, even though similar research on settlements in agricultural regions had been carried out earlier by Charles Galpin and John Kolb, who were rural sociologists. It seems, too, that the early British research on urban hierarchies was influenced more by inductive generalization and, perhaps, the ideas of the American sociologists, than by the systematic work of the Germans (Smailes 1944; Haggett 1965). Of course, both Christaller and Lösch were fully aware that their idealized systems would be distorted by local and regional factors, including industrial concentrations, port locations, and transportation routes. This key point was emphasized in a famous paper by Chauncy Harris and Ullman (1945) and then revisited by Otis Duncan et al. (1960) in their informative volume on US metropolitan functions and industries. Moreover, as early as the 1930s, people in the United States and United Kingdom were noting that even the settlement patterns of peripheral regions were not static because inexpensive and flexible means of transportation were changing both buying and marketing practices. In many of these regions, the smallest places in central place systems were not only declining in size but sometimes even disappearing. These central place principles had a very important impact on the city-size models that were formulated in later times.

3

New Perspectives: 1955–1975

3.1

Background

The early part of this second period was largely dominated by debate regarding the role of cities in economic development. Development in the past had involved moving people from low-productivity agriculture to high-productivity manufacturing, thereby relieving pressure on the land and allowing improvements to be made in agricultural practices. The conventional wisdom was that both per capita income and occupational structure would converge among the regions of a nation as industrialization proceeded and a vibrant service sector emerged. While some did not believe that urbanization would necessarily follow industrialization, others disagreed and felt that internal and external scale economies ensured that cities would continue to grow as industrialization unfolded (Lampard 1955). The development economist Benjamin Higgins (1968, p. 466) was especially prescient in claiming that “highly trained people of the kind needed for middle and high positions in either the public or the private sector like to live in or near cities.” He cited the cases of Brasilia and

City-Size Distribution: The Evolution of Theory, Evidence, and Policy in. . .

47

Canberra as examples of the difficulties involved when governments moved such talented people from large places to small places by edict.

3.2

The Main Contributions

3.2.1 Primacy The backdrop to this discussion was the realization that postwar urbanization in many parts of Latin America and Asia was going to be very different. These areas were characterized by technological and regional dualism and widening demographic gaps that meant youthful population cohorts were starting to appear everywhere. However, the United Nations and a variety of other agencies and commissions pointed out that many of these developing nations were dominated by a single city, often the capital, that was growing even faster than smaller cities. In fact, many of these nations simply lacked a stratum of intermediate-sized places, so the national city-size distributions of Argentina, Mexico, the Philippines, and so on were not only top-heavy but becoming even more so. But this discussion took another interesting twist. Borrowing from the insights of anthropology and sociology, Berthold Hoselitz, and others, argued that the cultural context for economic growth differed significantly between the more developed and less developed world. He pointed out that status and rewards in the former mainly arose through achievement, whereas they were mainly due to ascription in the latter. Moreover, the developed nations had adopted more universalistic attitudes, while the developing nations retained more particularistic ones. This meant that the social and political transformation of society was going to be very different in the newly emerging nations than it was earlier in the United Kingdom, France, and the United States. Moreover, many of the newly independent developing nations, especially in Africa, were left with colonial imprints, including inefficient bureaucracies and transportation systems largely designed to remove resources directly from the hinterland to ports in coastal areas. As a result, Hoselitz (1955) and others differentiated between generative cities, on the one hand, and parasitic cities, on the other. The thinking was that many large capital cities would impede the rapid economic growth that was needed in these merging nations during the accelerating stage of the urbanization curve. It was at this time that the term primacy took on a strong negative meaning, even though most observers agreed with John Friedman (1961) that cities were likely the only agents that could disrupt or destroy old social forms and reintegrate societies around newer, more impersonal ones. At the same time, it was noted by Norton Ginsburg (1961) that the primate city appeared in “more prosperous and long-established, as well as in poorer, newer countries, but it does appear to be more common in the latter than in the former” (p. 36). The paper by Surinder Mehta (1964, p. 147) brought the discussion to a pause, when he concluded that primacy did not seem “to be a function of the level of economic development, industrialization, or urbanization.” This, of course, was the background to the international discussions of the midand late 1950s that informed much of our thinking at that time about the role of urban

48

G. F. Mulligan and J. I. Carruthers

policy. The thoughts of François Perroux and his students on growth poles, and the debate between Gunnar Myrdal and Albert Hirschman regarding uneven growth in space, led many to conclude that governments should “rationally” allocate resources in order to reshape regional and national settlement systems. This case was picked up sometime later by people like Niles Hansen and Harry Richardson who advocated the selective use of focused investments to jumpstart regional economic growth. These views eventually led to a discussion about “optimal” city sizes and city-size distributions that still resonates today (see below).

3.2.2 Location Theory In 1956, Walter Isard published Location and Space-Economy, which coincided with the birth of the new discipline of regional science. Isard examined a variety of Zipfian “empirical regularities,” where he paid special attention to the rank-size distribution and to distance decay rates for different modes of human interaction. Here he made the case that regularity of intercity flows and regularity of city spatial patterns were possibly evidence for a regular hierarchy of cities. Later, after commending Lösch for his efforts at trying to specify a general spatial equilibrium model, Isard points out the fundamental contradiction of his economic landscape: while postulating an even distribution of consumers on the one hand, Lösch recognizes that workers (who are consumers) are needed at production points on the other. So, market areas must be smaller near the primate city found at the very center of the space economy. Basically, then, Isard was recognizing that any regional settlement system will be shaped by its underlying population-density gradient, an observation entirely consistent with the observation made later by Berry and Horton (1970, p. 92) that city areas would vary directly with population but inversely with population potential. But Isard’s stance was that this variable density would only affect the spacing between cities; instead, though, this element of heterogeneity could disturb both the frequency and population sizes of cities in the settlement system. Isard even acknowledges that the settlement system shown in his famous diagram (Fig. 52, p. 272) is only one of several scenarios that could evolve. Isard was also deeply interested in what he called the “spatial physiognomy” of urban areas: how they are assembled as “systems within systems of cities” (Berry 1964). 3.2.3 Hierarchy Models Another important stream of research was then initiated by Martin Beckmann when he outlined the first city-size model that reflected the thinking of central place theory. Using only a few assumptions, including a constant ratio between the population of a city and the population of its entire market area, Beckmann was able to recursively solve the population sizes for cities on higher and higher levels of the central place hierarchy. In the end, then, both city populations and market area populations could be traced out as exponential functions. While this meant that there would be a step function for the city-size distribution, he suggested that the operation of many random events would blur this step function and make his model compatible with the rank-size rule. However, the logic behind Beckmann’s first model was partly flawed – being more consistent with the income flows that Jan Tinbergen (1967)

City-Size Distribution: The Evolution of Theory, Evidence, and Policy in. . .

49

visualized for a spatial hierarchy – but he corrected his error for the well-known book Location Theory that appeared a decade later (Beckmann 1968). At the same time, John Parr (1969) was revisiting the Beckmann model with the intention of making its internal structure more consistent with Christaller’s spatial logic. As before, a single proportionality factor k (0 < k < 1) was introduced and then assumed to be fixed, from one level of the hierarchy to the next, in order to connect the population of each city to the population of the market area it would serve. This meant, again, that city populations would vary exponentially from one level of the hierarchy to the next. This result was consistent with a wide range of findings where traffic generation, the numbers of business establishments, and the like were found to vary exponentially with hierarchical level. The earliest version of this model adopted a fixed nesting factor K, but Parr later moved on to address both variable nesting factors and the more complex spatial logic of Lösch. Such modifications, of course, brought significant change to the shape of the overall city-size distribution. However, the original Beckmann-Parr model was no longer consistent with the rank-size rule. The next important advance was made in a short paper by Michael Dacey (1966), which remains underappreciated even now. His model focused more on the different bundles of goods and services that would be supplied by centers found at various levels of the hierarchy. The crucial element in Dacey’s approach was to have a bundle-specific term km (0 < km < 1; m = 1, 2, 3, . . .) connect the body of population in a center engaged in supplying the level m bundle to the total population in the surrounding market area receiving the level m bundle. In other words, a family of bundle-specific supply and demand functions, represented by different proportionality factors, was introduced without ever making the properties of those two functions explicit. This model was basically reinvented and extended to include variable nesting factors, by Beckmann and McPherson (1970).

3.2.4 Empirical Regularities However, without doubt the biggest influence on the city-size literature was made by Brian Berry, whose series of papers in the late 1950s and early 1960s spread the ideas of central place theory to a wide audience of geographers and planners. Numerous papers are notable, although the upcoming discussion focuses on only three of these. In the first paper, coauthored with William Garrison, the settlement system of Snohomish County in Washington State was examined to see if Christaller’s principle of hierarchy held true. Using extensive fieldwork the authors charted the incidence of different central functions (goods and services) across that regional settlement system. Here they found that (i) cities on a given level of the hierarchy generally offered all the goods and services found in cities on lower levels plus a bundle that was distinctive for that level and (ii) this implied that more income flowed into those places on higher levels which, in turn, ensured that discrete population levels would emerge throughout the regional system. The paper partly assuaged the concerns of the economist Rutledge Vining (1955) who believed that a continuum of populations, and not a hierarchy, will always occur on the landscape. Still later, the demographer Donald Bogue (1959) expressed similar concerns, and

50

G. F. Mulligan and J. I. Carruthers

these were not put to rest until it was pointed out by Berry that most studies mixed the results from various regional systems, often having very different incomes and densities, and this blurred the steplike features of each regional hierarchy. Subsequently, Berry and Garrison (1958) summarized the various recent proposals for citysize frequencies, especially those given by Zipf, Simon, and Christaller, and argued that there was a clear need for “articulated empirical and theoretical research” (p. 90) on the topic. In the end, the authors argued that the rank-size distribution represents average or steady-state conditions and, as such, it is a finding that might not generate as much interest as when a nation is dominated by a single, primate city or when numerous regional centers of about the same size compete for that #1 position. The third paper, by Berry alone, spurred a lot of interest in cross-national comparisons of city sizes. Using the same six size classes, cumulative frequencies of city populations exceeding 20,000 were plotted in 38 countries using lognormal probability paper. However, these countries were not independently chosen, and, using this methodology, those nations with steep slopes had less, and not more, populous large cities. Berry found that 13 of those countries exhibited log normally distributed distributions which, for all practical purposes, conformed to the rank-size rule. Another 15 countries had primate distributions, and the remaining nine had distributions intermediate between lognormal and primate. His conclusions, though, have remained controversial in part because he interpreted cross-sectional results with a time series (evolutionary) model. In any case, Berry suggested that the rank-size rule could be expected more in those nations that had complex industrial economies, were large in physical extent, and had long histories of urbanization. Under those conditions, the forces of entropy would be more apparent, and any tendency toward dominance by a single, primate city would be likely to dissipate over time. But later in this second era of research, another influential paper, by Gerald Hodge (1965), appeared that nicely brought together theory, empiricism, and a concern for public policy. Hodge focused on the viability of trade centers in southern Saskatchewan, Canada, although his methods and conclusions were clearly applicable to the many other farming regions spread across the Great Plains. As noted earlier, many of these trade centers had been heavily impacted by technological change in personal transportation and marketing and by shifts in the distribution systems for both primary commodities and retail goods. Moreover, rural residents were generally more affluent in the 1960s than they were before World War II. Hundreds of these central places not only served as economic centers in the narrow sense – providing farm supplies and low-level goods like gas and groceries – but they often provided much needed forums for social interaction and cultural fulfillment as well. Hodge was interested in identifying those places that would likely decline, or even disappear, so that provincial governments could target their investments in public goods and services to those places that would prosper or at least survive. From the start he noted that during 1941–1961 (i) the average spacing of trade centers expanded for small centers but contracted for large centers, (ii) intermediate-sized centers generally exhibited declining numbers, and (iii) small centers located close to large centers were the most vulnerable of all. After undertaking a multivariate analysis that

City-Size Distribution: The Evolution of Theory, Evidence, and Policy in. . .

51

addressed the demographic, economic, and social characteristics of these places, Hodge established a 7-tiered hierarchy ranging from hamlets, with 50 people or so, up to primary wholesale and retail centers, with more than a 100,000 people. The lowest-level places averaged a little more than 3 establishments and the largest housed more than 1400 stores, offices, and other businesses. The analysis clearly endorsed his earlier observations and offered other perspectives, including the key role of hospitals and high schools in maintaining the long-run viability of the centers. Moreover, he constructed the framework for a Markovian analysis of shifts in the proportions of trade centers across population size classes, calibrated by the 1941–1961 figures, but did not make predictions using that model. Hodge’s work was later followed up in numerous studies under the guidance of Jack Stabler where the issue of economic viability was revisited and other topics like business threshold requirements and community multipliers were addressed. The overall theme is that a region’s city-size distribution is constantly evolving given the multitude of decisions being made by private and public agents.

3.2.5 Early Simulations From the various studies of stochastic processes, including those discussed above, there emerged a realization that computer-driven simulation models might provide useful insights into the evolution of city systems. The use of these models in addressing city sizes begins with the pioneering work by Hägerstrand on spatial diffusion. Of special interest are several papers, including one where Monte Carlo measures are introduced to probabilistically assign activities in space. This is a flexible approach to modeling that allows features like uneven population distributions or physical and psychological barriers to influence the assignment of activities to space. An early application of this approach is the imaginative study by Richard Morrill (1963) of middle Sweden’s evolving settlement system between 1860 and 1960. Here Morrill pointed out that “certain possibilities, for example locations for an activity, are more likely choices than others and can be given higher probabilities of being selected” (p. 10). He takes as a starting point the region’s population pattern at some given point in time, and then he assigns, in order, new intercity transportation links, new manufacturing activities, and then new service activities. The next step in this sequence is to calculate new population numbers for the various cities and then assign, using gravity-type measures, streams of migrants between the cities throughout the regional system. These last probabilistic assignments are an especially important step, and they make use of Hägerstrand’s migration probability field. The Swedish study involved 155 areas, so there were 155  155 = 24,025 different migration paths to consider during each stage of the study. This simulation methodology was brought to maturity by Brian Robson (1973), who focused on the growth of cities in England and Wales between 1801 and 1911. Robson focused on how three major innovations – the telephone, building societies, and gas-lighting – spread among British towns and cities during the nineteenth century. Much like Morrill, he assigned new activities to cities and then projected

52

G. F. Mulligan and J. I. Carruthers

new population figures after these activities had been adopted. Robson acknowledged the various influences of Gibrat, Hagerstrand, and Christaller in developing the various technical aspects of his model of urban growth. Contrary to the opinions of many demographers, who emphasized declines in national fertility levels, he argued that the reduction of disruptive innovations around 1870 was largely responsible for the slowing down of population growth in the nation’s very largest cities. Near the end of the book, he provides an interesting commentary on the role of government in urban development. Evidently the British national government had invested heavily in telegraph infrastructure, by bailing out private companies during the 1870s, and was very reluctant to support new telephone infrastructure just a few years later. The scale advantages of having centralized control were also lost because municipal governments viewed this innovation as a means for generating local revenue and not as a mechanism for spurring on entrepreneurial capitalism throughout the entire nation. Robson’s simulation scenarios were especially valuable because they demonstrated how the more macro attributes of society – in demography, economics, and political action – could have differential effects on the growth prospects of cities that were integrating into a national system.

4

Mature Perspectives: 1975–1995

4.1

Background

During this third and last era, the city-size literature continued to mature although the conversation became increasingly focused on a couple of topics. First, there was finally realization that the testing of hypotheses using national population data needed both updating and econometric refinement. One main change here was that up-to-date and more reliable population data sets had become available through agencies like the United Nations and the World Bank; moreover, analysts now realized that the separate effects of various demographic, economic, and even political factors could be estimated simultaneously. Second, a handful of location theorists realized that Beckmann’s simple hierarchical model was ripe for both reinterpretation and extension. More complicated models, even of the static variety, were needed to capture the differences that would exist between cities on one level of the hierarchy and those on another level. As this era closed, analytical models had been developed that addressed more realistic behavior on the parts of both households and firms. Moreover, there was increasing recognition that the city hierarchy was not only channeling, but facilitating, the movement of people, ideas, and even diseases across the landscape. Surely, then, the current spatial attributes of settlement systems had important implications for the future growth of places in any of those systems. So, third, it was not that surprising when a new round of more sophisticated simulation models appeared that brought together key aspects of spatial economic behavior in a more dynamic framework.

City-Size Distribution: The Evolution of Theory, Evidence, and Policy in. . .

4.2

53

The Main Contributions

4.2.1 Deeper Empirical Investigations A handful of key papers appeared between 1975 and 1995 that recharged the debate on urbanization, economic development, and primacy, although much of the ensuing literature did not appear until after 2000. The literature of this third era seems to have had two major strands. In the first instance, a firm connection was established with the widely held hypothesis, advanced by Jeffrey Williamson (1965), that economic development leads to an initial widening and subsequent contracting of resource inequality. In the regional and spatial sciences, this hypothesis is often associated with William Alonso (1980), who argued that regional development was typically characterized by a variety of bell-shaped curves. In the second instance, there was some reconsideration of the arguments outlined above where primacy was often seen to be urban over-concentration attributable to either poor or biased public decisionmaking. Moreover, a United Nations (1993) report focusing on urbanization prospects of the day seems to have other been especially influential in making people believe that megacities were universally bad. The resource inequality issue was raised in the early 1970s, but, with hindsight, it seems his concerns had more to do with the limited attention given to longitudinal or historical studies of national city systems. Most studies were cross-sectional and simply assumed that similar estimates or results would be reflected in time series approaches. We now know that this assumption is often incorrect. But two other key improvements were made by Servet Mutlu (1989) in his extensive study of national settlement systems. On the one hand, he disentangled the issue of primacy from that of over-concentration by computing separate indices for each: the former index focused solely on the comparative population size of the largest city, but the latter index analyzed the full upper tail of largest population sizes. He also shifted the approach from correlation analysis to multiple regression analysis and then tested for the joint effects of several factors on the national levels of primacy across more than 60 nations. In the end he found that both primacy and concentration were negatively related to the nation’s population size, its level of urbanization, and its level of economic development. He also found that primacy and concentration were exacerbated by several conditions, including the degree of income inequality and the incidence of capitalism in the organization of society. So, his conclusions did suggest some policy avenues were open for nations to consider, such as reducing personal and regional income gaps and separating the nation’s administrative functions from its economic functions. Although mentioned obliquely, Mutlu was one of the first to acknowledge that endogeneity might exist in his methodology and that, for example, resource inequalities might not just cause, but follow from, unbalanced city-size distributions. The stream of empirical work by Moomaw and Shatter (1993, 1996), which begins in the early 1990s, was the most insightful during this last era of research that we cover. Influenced by several others, he generally confined his estimation to a narrow range of parameters that were widely believed to affect both metropolitan concentration and urban primacy. Moreover, he adopted panel data, involving a mix

54

G. F. Mulligan and J. I. Carruthers

of time series and cross-sectional figures, in making those estimates. His thinking, entirely correct, was that some conditions vary slowly with historic trends, and these would confound any simple, cross-sectional estimates of primacy. The earliest study, differentiated between the negative consequences of having high population shares in the largest city, and the positive consequences of having those shares spread more evenly across the nation’s largest cities. This finding disputed the still popular contention that the welfare costs of population growth in a nation’s largest city usually outweighed the benefits of that growth. Later, careful attention was given during the period 1960–1980 to the implications for urban primacy of several conditions, including per capita GDP, the relative importance of both agricultural and manufacturing jobs in the labor force, the incidence of exports, and the rate of literacy. Steps were also taken to control for other conditions, including whether the nation’s capital was also its largest city. Primacy was seen to be greater when the economy was smaller, per capita GDP was lower, the export share was lower, and literacy was lower. Of course, several of that paper’s conclusions were qualified by reference to the fixed and regional effects of the estimation. But, overall, the evidence increasingly suggests that urban primacy follows a bell-shaped curve and is a normal consequence of national development: overall dominance by the nation’s largest city is typically low to begin with, rises in importance as focused development takes off, and then declines as development matures and spreads in space. So, in most cases, primate cities might be large but they are likely not parasitic.

4.2.2 Better Hierarchy Models The conversation on hierarchy models resumed with another short paper by John Parr et al. (1975) that revealed the similarities between the models designed by Dacey (1966) and by Beckmann and McPherson (1970). This paper also clarified the stepped economic (export) base underpinnings of the entire hierarchical perspective, where the multiplier Bm = 1/(1 –  km). So, places on higher levels of the hierarchy will always have larger populations than places on lower levels for two reasons: (i) they require more basic workers to produce a greater variety of goods and services for their export markets, and (ii) they require more nonbasic workers to support these greater numbers of export workers. Soon thereafter Hubert Beguin (1979) showed that the results of this new model were compatible with population figures generated by the Beckmann-Parr model, on the one hand, and with population figures conforming to the rank-size rule, on the other hand: in both instances a unique vector (k1, k2, . . ., km) of service multipliers could be established where this compatibility would exist. Later, Mulligan used this model to demonstrate how rational shopping choices will reallocate the population sizes of places in Christaller’s system of central places. In fact, just as empirical studies revealed earlier, multipurpose behavior will induce shifts in population away from places that are low on the hierarchy toward places that are high on the hierarchy. This research also informed the growing literature on consumer behavior, which was trying to disentangle the economic benefits and spatial consequences that arise from multipurpose versus comparison shopping. Moreover, due to the linear properties of the model, the urbanization ratio (or urban-rural balance) of the entire

City-Size Distribution: The Evolution of Theory, Evidence, and Policy in. . .

55

settlement system can be shown to be Um =  km. So, the hierarchy model with level-specific service multipliers is sufficiently flexible to bring together the economic base concept, the rank-size rule, and the degree of urbanization in a single formulation. These issues and others are touched upon in more detail in a lengthy review paper by Mulligan. Here it was pointed out that this literature was evolving in such a way that analysts could now clearly distinguish between the production functions of firms and the demand functions of households. The paper by Gershon Alperovich (1982) is a case in point, where he shows that the features of the city-size distribution will depend upon the nature of scale economies, and diseconomies, that are made available to cities in the production of goods and services. This era of hierarchical models ends with an outstanding contribution from Carol Taylor that outlines a comprehensive spatial equilibrium of central places. Although multipurpose shopping is not permitted, and the spatial logic is more Tinbergen than Christaller, the production of goods and services on each level of the hierarchy accommodates both internal and external (Chipman type) scale economies. In summary, then, these static models make it very clear that the attributes of any city-size distribution will be affected by numerous forces, including, but not restricted to, the following list: the behavior of firms, the behavior of consumers, and the transportation infrastructure that is available for shippers and for shoppers. It must be recalled, too, that these models only address activities of the central-place variety and modifications must be made to allow for other economic or functional specialties among the various cities in the regional or national system (Harris and Ullman 1945; Duncan et al. 1960).

4.2.3 More Advanced Simulations Simulation models reached an even higher level of sophistication later in the decade, beginning with the research of Roger White (1974). Echoing the earlier work of Gerard Rushton (1971), White was interested in whether a city hierarchy would emerge when behavioral rules for agents were combined with any initial arrangement of production points. After establishing the revenue and cost conditions for producing two “goods” at different locations across a rectangular map, he solved for a series of profit equations at sequential points in time. Simulations were then repeated to trace out the most likely fortunes of 20 different centers. When the friction of distance was low, these centers could compete over great distances, and the advantages of universal centrality became very clear. But when the friction of distance was high, distances to neighboring competitors became more important, and centers did not become so different in population size. Similarly, when thresholds (fixed costs) for businesses were raised, the surviving centers were evenly dispersed over the landscape, and they had similar population sizes. Clearly, the basic tenets of central place theory were endorsed here, although the features of the city system would depend upon the economies of scale available to producers and the ease of transportation available to consumers. But the novel contribution was the dynamism brought to the standard theory: the equilibrium features of the system were seen to be short-lived, and they could be disturbed by any disruptive changes that were brought to the behavior of producers or consumers. White (1977, 1978)

56

G. F. Mulligan and J. I. Carruthers

followed up this work with two more papers a few years later, and here the sources of instability in city-size evolution were examined in more detail. He also specifically tested whether simulated population sizes were generally conforming to the ranksize distribution. In the end he concluded that this distribution resulted when realistic values were used for the various interaction, cost, and population parameters of the simulations. Primacy, on the other hand, typically appeared when the population parameter for high order goods – those having a low friction of distance – was unrealistically high, and then instabilities crept into the various simulations. So, the primate distribution occurs when conditions conspire to allow the population growth of cities to exceed their systemic constraints, and this can happen only in those places occupying locations of high centrality where high order goods and services are provided for extensive market areas. This stream of research ended sometime before the mid-1980s. Now the adaptive capacity of the hierarchy was extended to accommodate the births and deaths of central places. Again, using only two functions, it was shown that a multitude of possible stable solutions could appear on the landscape. But, when a third high-order function was introduced, and agglomeration could occur through positive feedbacks, the simulated results became more interesting. A size distribution resembling the rank-size rule invariably appeared even though the populations and locations of the intermediate-sized places varied a lot. Intracity variation was next considered, where population concentrations would first promote population decentralization, with a local gradient, and then, eventually, economic decentralization, with commuting. Their most complex simulations suggested that regional systems of cities would naturally appear where the largest cities across those different regions would be involved in competition for future population growth. Very clearly, externalities and centrality work together in their various models, where positive externalities lead to high population concentrations and negative externalities lead to more dispersed populations over the landscape.

5

Summary and Conclusion

This chapter has focused on the city-size literature in the regional and spatial science during the time period 1935–1995. This period was divided into three 20-year periods in order to clarify the progressive evolution of thought in the areas of theory, empirical testing, and policy formation. In recent years the literature has become even more technical and focused. Important ideas have come from physics and mathematical statistics, but the biggest impact has come from the ideas of the so-called new economic geography, and others, who have brought an entirely new analytical perspective to the existence of dynamic equilibria in spatial systems. Empirical work has also become more sophisticated, and authors have argued that a handful of factors like economies of scale and transportation costs suffice to explain the changing nature of urban primacy and that analysts have no need to revert to dependency theory or world systems models to explain the phenomenon. Perhaps the biggest need going forward is for regional science to successfully bridge

City-Size Distribution: The Evolution of Theory, Evidence, and Policy in. . .

57

Fig. 1 Anatomy of a Metropolis

the gap between the kind of positive and normative questions set out in the introduction to this chapter. As knowledge of city systems has evolved, it seems that there still exists a shortcoming for that knowledge, particularly as theoretical models have become more sophisticated. As this chapter has made clear (and illustrated, with contemporary maps and data), the field has built a massive foundation on which to base new insights and policy prescriptions. To this point, the regularities documented throughout this chapter are more than intriguing geographic features. Look no further than the Northeast Corridor of the United States: a projected map of the region, based on point patterns developed by the authors, is shown in Fig. 1. This Anatomy of a Metropolis (Hoover and Vernon 1959) is on full display as the latitude (y-axis) and longitude (x-axis) of the ~36,675 individual neighborhood centers (census blood groups) that make up the region are graphed against one another. The regression line running through the center of the region (latitude regressed on longitude) has an R2 value of over 0.80, suggesting that the Northeast Corridor is among the most organized spatial structures on earth.

6

Cross-References

▶ Complexity and Spatial Networks

References Aitchison J, Brown J (1957) The lognormal distribution. Cambridge University Press, Cambridge Alonso W (1980) Five bell shapes in development. Papers Reg Sci Assoc 45:5–16 Alperovich G (1982) Scale economies and diseconomies in the determination of City size distribution. J Urban Econ 12:202–213 Beckmann M (1968) Location theory. Random House, New York Beckmann M, McPherson J (1970) City size distribution in a central place hierarchy: an alternative approach. J Reg Sci 10:25–33

58

G. F. Mulligan and J. I. Carruthers

Beguin H (1979) Urban hierarchy and the rank-size distribution. Geogr Anal 11:149–164 Berry B (1964) Cities as systems within Systems of Cities. Pap Reg Sci 13:147–163 Berry B, Garrison W (1958) Alternate explanations of urban rank-size relationships. Ann Assoc Am Geogr 48:83–91 Berry B, Horton F (1970) Geographic perspectives on Urban Systems. Prentice-Hall, Englewood Cliffs Bogue D (1959) Population distribution, Chapter 17. In: Hauser P, Duncan O (eds) The study of population: an inventory and appraisal. University of Chicago Press, Chicago Bourne L (1975) Urban Systems: strategies for regulation. Clarendon Press, Oxford Christaller W (1933) Die Zentralen Orte in Süddeutschland, Jena: Fischer. English translation by C. Baskin, The Central Places of Southern Germany (1966). Englewood Cliffs, Prentice-Hall Dacey M (1966) Population of places in a central place hierarchy. J Reg Sci 6:27–33 Duncan O, Scott R, Lieberson S, Duncan B, Winsborough H (1960) Metropolis and region. Johns Hopkins Press, Baltimore Friedman J (1961) Cities in social transformation. Comp Stud Soc Hist 4:86–103 Gibrat R (1931) Les Inégalités Économique. Recueil Sirey, Paris Ginsburg N (1961) Atlas of economic development. University of Chicago Press, Chicago Haggett P (1965) Locational analysis in human geography. Edward Arnold, London Harris C, Ullman E (1945) The nature of cities. Ann Am Acad Political Soc Sci 242:7–17 Higgins B (1968) Economic development, 2nd edn. W. W. Norton, New York Hodge G (1965) The prediction of trade center viability in the Great Plains. Pap Proc Reg Sci Assoc 15:87–115 Hoover EM, Vernon R (1959) Anatomy of a Metropolis: the changing distribution of people and jobs within the New York metropolitan region. Harvard University Press, Cambridge Hoselitz B (1955) Generative and parasitic cities. Econ Dev Cult Chang 3:278–294 Isard W (1956) Location and space-economy. MIT Press, Cambridge Jefferson M (1939) The law of the Primate City. Geogr Rev 24:226–232 Lampard E (1955) The history of cities in the economically advanced areas. Econ Dev Cult Chang 3:81–102 Lösch A (1938) Die räumliche Ordnung der Wirtschaft, Jena: Fischer. English translation of the 2nd edition (1944) by Woglom W, Stolper W. The economics of location (1954), Yale University Press, New Haven Madden C (1956) On some indications of stability in the growth of cities in the United States. Econ Dev Cult Chang 4(3):236–252 Mehta S (1964) Some demographic and economic correlates of primate cities: a case for revaluation. Demography 1:136–147 Moomaw R, Shatter A (1993) Urbanization as a factor in economic growth: an empirical study. J Econ 19:1–6 Moomaw R, Shatter A (1996) Urbanization and economic development: a Bias toward large cities? J Urban Econ 40:13–37 Morrill R (1963) The development and spatial distribution of towns in Sweden: an historicalpredictive approach. Ann Am Assoc Geogr 53:1–14 Parr J (1969) City hierarchies and the distribution of City size: a reconsideration of Beckmann’s contribution. J Reg Sci 9:239–253 Parr J (1970) Models of City size in an urban system. Pap Reg Sci Assoc 25:221–253 Parr J, Denike K, Mulligan G (1975) City size models and the Economic Base: a recent controversy. J Reg Sci 15:1–8 Robson B (1973) Urban growth: an approach. Methuen, London Rushton G (1971) Postulates of central-place theory and the properties of central-place systems. Geogr Anal 3:140–156 Saichev A, Maleverhne Y, Sornette D (2010) Theory of Zipf’s law and beyond. Springer, New York Sheppard E (1982) City size distributions and spatial economic change. Int Reg Sci Rev 7:127–151 Simon H (1955) On a class of skew distribution functions. Biometrika 42:425–440

City-Size Distribution: The Evolution of Theory, Evidence, and Policy in. . .

59

Smailes A (1944) The urban hierarchy in England and Wales. Geography 29:41–51 Stewart J (1947) Empirical mathematical rules concerning the distribution and equilibrium of population. Geogr Rev 37:461–485 Stewart J (1958) The size and spacing of cities. Geogr Rev 1948:222–245 Tinbergen J (1967) The hierarchy model of the size distribution of centers. Pap Reg Sci Assoc 20:65–68 Ullman E (1941) A theory of location for cities. Am J Sociol 46:853–864 Vining R (1955) A description of certain aspects of an economic system. Econ Dev Cult Chang 3:147–195 Vining D (1974) On the sources of instability in the rank-size rule: some simple tests of Gibrat’s law. Geogr Anal 6:313–329 Vining D (1977) The rank-size rule in the absence of growth. J Urban Econ 4:15–29 White R (1974) Sketches of a dynamic central place theory. Econ Geogr 50:219–227 White R (1977) Dynamic central place theory: results of a simulation approach. Geogr Anal 9:226–243 White R (1978) The simulation of central place dynamics: two-sector systems and the rank-size rule. Geogr Anal 10:201–208 Williamson J (1965) Regional inequality and the process of National Development. Econ Dev Cult Chang 13:3–45 Zipf G (1949) Human behavior and the principle of least effort. Addison-Wesley, Cambridge

Classical Contributions: Von Thünen and Weber Roberta Capello

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 The Location of Activities in Space: Land Rent Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Accessibility and Transportation Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The Location of Agricultural Activities: The von Thünen Model . . . . . . . . . . . . . . . . . . . . 2.3 The Location of Activities in a City: A Partial Spatial Equilibrium Approach . . . . . . 2.4 The Location of Activities in a City: A General Spatial Equilibrium Approach . . . . . 2.5 Critical Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 The Location of Industrial Activities in Space: Weber’s Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Transportation Costs and Agglomeration Economies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Weber’s Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Critical Remarks of the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

62 64 64 65 68 69 73 74 74 75 78 79 79

Abstract

Within location theory, classical models are typical abstract and formalized models, in which the main reasoning behind location choice of firms is driven by the minimization of transportation costs to achieve natural and intermediate production resources, and markets for final goods that are territorially dispersed. Classical models are similar in the question they want to reply to: what economic logic explains the location choices of firms in space? This topic is an important one. Although in terms of time and financial resources, the performance of transport and communication has improved enormously, many economic activities have not become footloose to the extent expressed by the “death of distance.” Their location choice still remains anchored to a balance between a physical R. Capello (*) Department Architecture, Built Environment and Construction Engineering A.B.C., Politecnico di Milano, Milan, Italy e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_94

61

62

R. Capello

location generating economic advantages – in the form of agglomeration economies – and transport costs to intermediate or final markets, as explained by these models. Keywords

Transportation cost · Location choice · Agglomeration economy · Land rent · High rent

1

Introduction

Economic activity arises, grows, and develops in space. Firms, and economic actors in general, choose their locations in the same way as they choose their production factors and their technology. Location theory aims at explaining the economic rationale behind the choice of a firm to locate in a specific point in space and thus at interpreting the allocation of different portions of territory among different types of production, the division of a spatial market among producers, and the functional distribution of activities in space. This topic is an important one; although in terms of time and financial resources, the performance of transport and communication has improved enormously, many economic activities have not become footloose to the extent expressed by the “death of distance” (Cairncross 1997). The location choice of firms still remains anchored to a balance between advantages that a physical location generates for firms – in the form of agglomeration economies – and transport costs, keeping the role of space in modern economies an important issue. Within location theory, classical models are typical abstract and formalized models, in which the main reasoning behind location choice of firms is driven by the minimization of transportation costs to achieve natural and intermediate production resources, and markets for final goods that are territorially dispersed. Localization theory defines “transportation costs” as all the forms of spatial friction that give greater attractiveness to a location which reduces the distance between two points in space (e.g., production site and the final market, place of residence and the workplace, the raw materials market, and the production site). Transportation costs are essential to location theory, in general, and to industrial location choice models, in particular, for a very simple reason. The value of two production resources located in two different points in space can be directly compared only if the physical distance between the two resources is discounted. Transportation costs (i.e., the cost of movement of such inputs in space) represent the discount rate of space, as interest rates represent the discount rate of values in time (Isard 1956). It is important to notice that in all location theory, transportation costs have a wider meaning than that of the economic cost of shipping goods (the pure cost of transporting and distributing them); they refer more in general to the opportunity cost represented by the time taken to cover the distance which could instead be put to

Classical Contributions: Von Thünen and Weber

63

other uses, by the psychological cost of the journey, by the cost and difficulty of communication over distances, and by the risk of failing to acquire vital information. Transportation costs are even comprised in the concept of agglomeration economies, which denote all economic advantages accruing to firms from concentrated location close to other firms, namely, reduced production costs due to large plant size, the presence of advanced and specialized services, the availability of fixed social capital (e.g., infrastructures), the presence of skilled labor and of managerial expertise, and the presence of a broad and specialized intermediate goods market. If transportation costs were nil, there would be no reason to concentrate activities, because doing so would not produce any economic advantage. In this sense, agglomeration economies are “proximity economies”: they are, that is to say, advantages which arise from the interaction (often involuntary) among economic agents made possible by the lower amount of spatial friction in concentrated locations. The logic behind classical location models is therefore the minimization of transportation costs between a production area and the market where goods are sold (von Thünen model), between place of residence and the workplace (Alonso model), and among production site, raw materials markets, and the final goods market, which together define a minimum transportation cost directly compared against agglomeration advantages (Weber model). Classical models are similar in the question they want to reply to: what economic logic explains the location choices of firms? Their richness comes from what they, directly and indirectly, are able to explain. In replying to the question on where activities would be willing to locate their production in space, the von Thünen’s model indirectly interprets the formation of a land rent at different distances from a market place in a general equilibrium setting. Within the same framework of assumptions of von Thünen’s model, in a partial equilibrium approach, Alonso is able to explain the location choice of a new incoming firm in a city; moreover, in a general spatial equilibrium framework, Alonso is able to interpret the allocation of alternative types of activities in the urban space. In replying to the same conceptual question, i.e., “where do activities locate their production?” Weber is able to identify a location that reflects the two economic forces that organize activities in space and that push the location process in opposite directions: from one side, transportation costs, that – in conditions of perfect competition, perfectly mobile production factors, fixed raw materials and demand perfectly distributed across the territory – induce dispersion and, from the other, agglomeration economies that call for spatial concentration of production. In what follows, the two main classical location theory models are presented. Firstly, the von Thünen-Alonso model is presented based on the assumption that the production site assumes a spatial dimension and extends across a territory, while the consumption site (the market) is punctiform. In this way, the model defines a “production area,” meaning by this the physical space (the land) occupied by an individual economic activity. Secondly, the Weber model is taken into account based on the assumptions that both the production and consumption sites are punctiform and that the minimization of the distance between them drives the

64

R. Capello

identification of the minimum transportation cost location, directly compared against a location generating agglomeration advantages. The models interpreting the identification of market areas, assuming that production develops at specific points in space and it supplies geographically dispersed markets, are left to the next chapter.

2

The Location of Activities in Space: Land Rent Formation

2.1

Accessibility and Transportation Costs

In the theories examined in this section, location choices are dictated by a specific principle of spatial organization of activity, namely, “accessibility,” and in particular accessibility to a market or a “center.” For firms, high accessibility means that they have easy access to broad and diversified markets for final goods and production factors, to information, and to the hubs of international infrastructures. For people, accessibility to a “central business district” and therefore to jobs means that their commuting costs are minimal, while at the same time, they enjoy easy access to a wide range of recreational services restricted to specific locations (e.g., theaters, museums, libraries) and proximity to specific services (e.g., universities), without having to pay the cost of long-distance travel. High demand for accessibility to central areas triggers competition between industrial and residential activities for locations closer to the market or, more generally, closer to the hypothetical central business district (the city center). The location choice models described in this section have an important feature in common: the cost of land or land rent. In presence of a high demand for central locations, assuming the existence of a single central business district, land closer to the center costs more, a condition accentuated by the total rigidity, at least in the short-to-medium period, of the urban land supply. The models resolve the competition among activities on the basis of a strict economic principle: firms able to locate in more central areas are those able to pay higher rents for those areas. These models envisage one specific factor organizing activity in space: land rent, this being the sole principle which explains location choices by all activities, whether agricultural, productive, or residential. The strength of these models is the elegant and irrefutable logic with which they account for the distribution of productive, agricultural, and residential activities in a geographic space from which they eliminate every differentiating effect except for physical distance from the center. Given their assumptions on the structure of demand and supply in space, these models are particularly well suited to analysis of the location of industrial and residential activities in urban space. In an urban environment, in fact, it is easy to hypothesize the existence of a single business district (a city center) which, for firms, performs the function of collecting, distributing, and exporting the city’s products and, for households, is the place where jobs are available. These models are able to establish where an individual firm will locate. The first model analyzing the spatial distribution of alternative production activities was developed in the early nineteenth century by Johann von Thünen. Only in

Classical Contributions: Von Thünen and Weber

65

the 1960s did pioneering studies by Walter Isard, Martin Beckmann, and Lowdon Wingo prepared the ground for Alonso’s formulation of von Thünen’s historical model applied to an urban context (Beckmann 1969; Isard 1956; Wingo 1961). The model of the monocentric city soon became a freestanding school of thought within location theory, where it was labeled “new urban economics.” This corpus of theories endeavored to develop general equilibrium location models in which the main interest is no longer the decisions by individual firms. Instead, the main areas of inquiry become definition of the size and density of cities, and identification of the particular pattern of land costs at differing distances from the city that guarantees achievement of a location equilibrium for all firms in the city.

2.2

The Location of Agricultural Activities: The von Thünen Model

Johann Heinrich von Thünen developed the first location model based on the hypothesis of a continuous production space and a single punctiform final market (von Thünen 1826). His model has generated the entire corpus of theories on the urban location of economic activities. Von Thünen’s model is based on a set of assumptions which all subsequent theories would adopt: (a) There exists a uniform space where all land is equally fertile and transport infrastructures are identical in all directions (isotropic space). (b) There is a single center, the medieval town, where all goods are traded (i.e., there is a specific market place). (c) Demand is unlimited, an assumption which reflects the supply-oriented nature of the model: the location equilibrium depends solely on the conditions of supply. (d) The production factors are perfectly distributed in space: the allocation of land among alternative production activities does not derive from an uneven spatial distribution of the production factors. (e) There is a specific production function, with fixed coefficients and constant returns to scale, for each agricultural good; this assumption entails that the quantity of output obtainable from each unit of land and the unit cost of production are fixed in space. (f) Perfect competition exists in the agricultural goods market: farmers therefore take the prices of the goods they produce to be given. (g) Unit transportation costs are constant in space: the total cost of transportation depends on the distance between the production site and the town and on the volume of production. Transportation costs may vary according to the crop. Assuming the existence of a certain number of farmers, von Thünen addresses the problem of how to determine the allocation of land among farmers working in the area surrounding the market place. He bases his model on a concept of rent as a residual which comes from Ricardo’s model and would also characterize subsequent

66

R. Capello

models. Ricardo was the first to interpret rent as a differential rent (Ricardo 1971, original version 1817). In Ricardo, land rent is formed as a result of the productivity difference between agricultural lands (decay of productive power); if all agricultural lands had the same fertility, rent would be null. In von Thünen, lands are equal in terms of fertility but differ in terms of accessibility to the market; the difference in accessibility explains their different rent values. In von Thünen, the price that farmers are willing to pay for land is the remainder left when transport and production costs, including a certain remuneration (profit) for the farmer, have been subtracted from revenues. In this logic, rent does not enter the formation of a good price, but is formed on the basis of the demand of products. If rent is high, it is because the demand for goods is high, and therefore the final price of goods is high; the high price of goods generates high profits and therefore high margins for rents. This means that rent is formed through the distribution of income rather than through income production. In the words of the classical economist Ricardo, “corn is not high because rent is high, but rent is high because corn is high” (Ricardo 1971, p. 98). In von Thünen’s model, in formal terms, if x is the quantity of a good per unit of land produced by a farmer, c the unit cost of production, p the price of the agricultural good, τ the unit cost of transportation, and d the distance to the market, then rent r is defined as: r ðdÞ ¼ ðp  c  τd Þx

ð1Þ

This equation states the levels of rent that farmers are willing to pay for land at different distances from the market place where goods are traded. It is represented graphically by a straight line, with slope –τd and intercepts equal to (p  c)x and (p  c)/τ, respectively, denoting the maximum value of rent in the town and the maximum distance from the town, where land value is nil. From Eq. (1) one can straightforwardly obtain the impact on rent due to a shift in space (e.g., of 1 km) by calculating the first derivative of rent with respect to distance: drðd Þ ¼ τx dd

ð2Þ

As Eq. (2) shows, the variation in rent is exactly equal to –τx: a shorter distance from the center generates a saving in total transportation costs equal to the increase in the rent required to occupy more central locations. This result is important because it has been obtained by all the models developed since von Thünen’s. It states that rent is nothing other than a saving in transportation costs made possible by more central locations. From this follows the “indifference to alternative locations” condition, which is reached by an individual or a firm when a move in space costs nothing, i.e., when the saving in transportation costs obtained by moving 1 km closer to the center equals the cost of the land that must be purchased to do so.

Classical Contributions: Von Thünen and Weber

67

Rent r(A)

r(B) Market place

r(C) A

a⬘

b⬘

c⬘

Distance

B C

Distance

Fig. 1 Land allocation among three farmers: the Von Thünen model. Legend: r(A) willingness to pay of farmer A, r(B) willingness to pay of farmer B, r(C) willingness to pay of farmer C, A area allocated to farmer A, B area allocated to farmer A, C area allocated to farmer C, a0 maximum distance from the center for which A offers a higher rent than famers B and C, b0 maximum distance from the center for which B offers a higher rent than famers A and C, c0 maximum distance from the center for which C offers a higher rent than famers A and B

On the assumption that there are three farmers (A, B, and C), each of them producing a specific agricultural product with a differing degree of perishability, a rent supply curve can be constructed for each farmer. Because goods are perishable to differing extents, the rent supply curves assume different positions and slopes (Fig. 1). The farmer who produces the most perishable good will have a productive process that uses the land in the most intensive and economically efficient way (geometrically, the highest intercept on the vertical axis, equal to (p–c) x), and he will be more willing to pay the rent charged for land 1 km closer to the town (geometrically, the steeper slope of the straight line, equal to (τx)). As the farmers compete for the more accessible land, each unit of surface area will be allocated to the farmer willing to pay the highest rent for that land. As far as a0 , the land will be allocated to farmer A, who offers the highest rent for the most central locations, from a0 to b0 to farmer B and from b0 to c0 to farmer C: the actual rent realized by the landowner from cultivation of his land is the envelope of the three rent supply curves. This model strikes for its strong interpretative power. Assuming a homogeneous plain, with no economic activities located in this geographical space, the model is

68

R. Capello

able to explain the formation of agglomeration by the simple distance from, or accessibility to, the town (expressed by transportation costs) which accounts for differences in land rent. Moreover, assuming equal fertility of land everywhere, the model departs from the classical Ricardian view that land profitability is explained by different degrees of fertility (Ricardo 1971; orig. ed., 1817); simple distance from, or accessibility to, the town (expressed by transportation costs) accounts for differences in land rent in this model. By eliminating everything except the distance between land and the town from concrete geographic space, von Thünen defines a new type of space, namely, economic space.

2.3

The Location of Activities in a City: A Partial Spatial Equilibrium Approach

In the early 1960s, first William Alonso and then Richard Muth reconsidered von Thünen’s model and adapted it to an urban context (Alonso 1960, 1964a, b; Muth 1968, 1969), thus paving the way for numerous subsequent studies. Alonso and Muth extended the bases of von Thünen’s pioneering model, making it more specific to the urban case, but they also made it more general by abandoning the hypothesis that only transportation costs express spatial friction and the preference for more central locations. The assumptions of Alonso’s model are the same as those of von Thünen’s model of agricultural activity described above. It envisages a city (no longer a plain) characterized by uniform space (homogeneous spatial distribution of the production factors) and endowed with infrastructures which cover the entire city in all directions (isotropic space). The city has a single center – the city center or business district – which is generically defined as the most attractive location for all firms. Given these assumptions, the city is analyzed along only one dimension: a radius comprising different distances from the city center to the periphery. The base model addresses a problem similar to the one which preoccupied von Thünen. As firms compete for central locations, the model shows how the urban space is allocated among alternative types of production once the market cost of land at different distances from the center is known. Also Alonso’s model defines rent as the remainder left when the entrepreneur has subtracted production costs (including transportation costs) and a desired level of profit from the revenue obtained by selling the good. Formally, rent is expressed as r ðdÞ ¼ ðpx  π  cðdÞÞxðdÞ

ð3Þ

where r denotes the rent, px the unit price of the good produced by the entrepreneur, c production costs per unit of land (including transportation costs) at distance d from the city center, π the profit per unit of land, and x the quantity of the good produced per unit of land at a distance d from the city center. Because production costs include transportation costs, in the Alonso model, they depend on distance, as they do in von Thünen’s model. However, unlike in the latter, revenues too depend on distance: a less suburban location gives greater proximity to broader markets and consequently access to higher earnings (consider the sales of a

Classical Contributions: Von Thünen and Weber

69

shop located in the city center compared to one in the periphery, especially if they sell luxury items). Equation (3) expresses the “bid rent” or the rent (by square meter) that the entrepreneur is willing to pay at differing distances from the center, once costs and the entrepreneur’s intended profit have been subtracted from revenues. Profits remaining equal, a more central location implies a willingness to pay higher rent because the entrepreneur incurs lower transportation costs and obtains higher revenues. Likewise, a suburban location can yield the same profit if and only if less rent is paid for the land: the saving on land cost must offset the higher transportation costs and the lower revenues that less central locations entail (Fig. 2a). At every distance from the center (e.g., d0 in Fig. 2b), if the firm wants to increase its profits, it must offer a lower rent. Vice versa, at the same distance, it can offer a higher rent if it is willing to accept lower profits. It is therefore possible to plot different bid-rent curves for an individual firm, all of them with the same slope and each of them defined on the basis of a different profit level which increases toward the origin (Fig. 2b). In a partial equilibrium framework, which assumes as known the “market land rent curve” (i.e., the real market cost of land expressed by curve re in Fig. 2b), it is possible to define the optimal location for the firm. Along the market land rent curve (re), the firm will choose the location yielding the highest profit, which is expressed by the tangency of the market rent curve with the lowest bid-rent curve. In Fig. 2b, the location equilibrium is reached at point E and thus at a distance d0 from the center and with a rent equal to r(d0) (Fujita 1985).

2.4

The Location of Activities in a City: A General Spatial Equilibrium Approach

If we discard the assumption that a city already exists (a market land exogenously determined) and therefore if we move away from the identification of an urban

a

b re: market land rent Rent

Rent Bid-rent curve

E

r (d0) r (po ) r (p 1) r (p 2) Center

Distance

Center

d0

Distance

Fig. 2 The bid-rent curve and the location equilibrium for different types of activities. (a) Bid-rent curve. (b) Location equilibrium for each type of activity

70

R. Capello

location for a new firm entering a city, we abandon a partial spatial equilibrium model and go back to von Thünen’s general spatial equilibrium framework, which entails an interesting interpretation of the allocation of the urban space between alternative production activities or between production and residential activities. Suppose the existence of a point in space (a center) attracting firms because of the presence of a market for their goods. Different types of activities compete for locating closest to the central location in a homogeneous space around the center. Let us assume that there exist types of activities with different propensity for central location. The slopes of the bid-rent curves will differ according to the different levels of propensity for central location; as the propensity increases, firms will be willing to pay more for houses in order to locate (one unit of distance) closer to the center (Fig. 3). The three types activities are distributed across the urban area as in von Thünen’s model: each area will be occupied by activities that make the highest rent bid. Market land rent will be the envelope of the bid-rent curves at each distance from the center so that the city can be depicted as a set of concentric rings each containing activities willing to pay the highest rent for that distance (Fig. 3). But what determines the propensity for a central location? To reply to this question, a reasoning on the slope of the bid-rent curve, which expresses the variation in the cost of land due to a one unit of variation in the distance from the center, is helpful. The slope is given by

Fig. 3 Location equilibrium for activities with different propensities for central location. Type of activity 1 = high propensity for central location. Type of activity 2 = average propensity for central location. Type of activity 3 = low propensity for central location. r1 = rent bid by activity 1, r2 = rent bid by activity 2, r3 = rent bid by activity 3

Rent

r1

r2 r3 Center Distance Type of activity Type of activity

Type of activity 3

Classical Contributions: Von Thünen and Weber

@r ðdÞ @xðd Þ @cðdÞ ¼ ð px  π  c ð d Þ Þ  x ðd Þ @d @d @d

71

ð4Þ

This shows that, at one unit of distance further away from the center, the rent offered to maintain the same profit level π diminishes because of increased transportation costs and decreased revenues. Eq. (4) contains the four elements that, on their own or in combination, theoretically explain higher propensity of activities to central location; an activity will be in fact interested to locate in the center if: • The costs of moving one unit of good toward the center (@c@dðdÞ) are high.

• The influence of distance on the demand of goods (@x@dðdÞ) is high. • Extra profits ( px  π  c(d)) are high. • The value of activities per unit of land (x) is high.

Table 1 presents some examples of activities characterized by the major values of the slope of the bid-rent curve: (a) Activities oriented toward a high demand density, like commercial activities, shopping centers but also specialized shops, or luxury good shops, all characterized by a strong influence of distance on the demand of goods. (b) Advanced service functions (e.g., lawyers, specialized doctors) or of activities that require a prestigious location that can obtain thanks to their oligopolistic position (banks, insurances, public and private managerial functions), whose costs of moving one unit of good toward the periphery and the influence of distance on the demand of goods sold by unit of land are low but the extra profits of a central location are very high; through a central location, these activities abandon a perfect competition market and differentiate the product quality through the use of a traditional urban input factor, like information. (c) Travel agencies and insurance brokers, all characterized by a very high value of their activity per unit of land.

Table 1 Taxonomy of activities with high propensity for central location @cðd Þ @d

@xðd Þ @d

Low

High

px  π  c(d ) Normal

x Normal

Low

Normal

High

Normal

Low High

Normal Normal

Normal Normal

High Normal

High

Normal

High

High

Source: Camagni 1992

Activities Commercial activities, shopping centers but also specialized shops, or luxury good shops Advanced service functions (e.g. lawyers, specialized doctors), or of activities that require a prestigious location Travel agencies, insurance brokers Activities dependent on population and on other central activities Financial intermediaries

72

R. Capello

(d) Activities that depend on a central market, with a high transportation cost of the final output: all industrial and service activities that depend on population and central activities. (e) Financial activities mostly linked to the stock exchange, whose propensity of a central location is the result of high transportation costs, monopolistic profits, and a high value of activity per unit of land. These reflections provide already a first evidence of how these models, strongly abstract in their nature, are able to describe conditions which closely match actual reality. The same logic of Fig. 3 can be applied for a simultaneous analysis of the location of firms and households. On the hypothesis that the rent gradient of firms is higher than that of households (i.e., firms are willing to pay higher unit rents in order to move one unit of distance closer to the center, as it is usually the case in the real world), the bid-rent curves for firms and households will be those shown in Fig. 4. This model leads to two important results. The first is that it identifies the bid-rent curves of firms and households endogenously. Let us assume that at time t0, households choose a level of rent r0(d0) characterized by a certain level of utility. For equilibrium to come about, the level of utility must be such that it determines an amount of population and a labor supply equal to the labor demand of firms. If households too have chosen a high level of utility and therefore make rent bids which are too low, the population located in the city (in the range d1  d0max ) may be insufficient to satisfy the labor demand by firms. The availability of work will attract new households into the city, with a consequent increase in demand for urban land which pushes the bid-rent curve up to r1(d1) in Fig. 4. The city will expand (d 00max ) until labor-market equilibrium has been reestablished at a lower level of utility. The second important result of this model is that it divides the urban area between productive and residential activities. Urban land will be allocated to the activities able to pay a higher rent for each distance from the center – as in von Thünen’s Rent

Bid-rent curve of firms at time t 0 r1(d1) r0(d0)

Bid-rent curve of households at time t 1

r (da)

Center

Bid-rent curve of households at time t 0 d2

d1

d ⬘max

d ⬙max

Distance

Fig. 4 Location equilibrium for households and firms. Legend: r(da) agricultural land value

Classical Contributions: Von Thünen and Weber

73

8000.0

Rental values per sqm. per annum in euros

7000.0 6000.0 Paris Milan Rome Amsterdam London

5000.0 4000.0 3000.0 2000.0 1000.0 0.0 Center

Semicenter

Periphery

Distance from the center

Fig. 5 Urban rent gradients in some major European cities (Rental values per sqm. Per annum in euros – 2003) – Retail. (Source: Capello 2007, p. 60)

model. In this case, the central areas will be occupied by firms, while households will be pushed toward suburban areas: a theoretical result which closely reflects what actually happens in reality. As a final remark, although these models are highly abstract, owing to their unrealistic hypotheses (isotropic space, a city with a single center), they are able to describe conditions which closely match actual reality: an urban land rent gradient negative with respect to (business and management) broad suburban spaces for residential activities (Fig. 5).

2.5

Critical Remarks

Despite their logic, elegance, and economic rigor, these models have a number of theoretical elements which weaken their overall logical structure. One of them is the decisive role played by commuting in determining location equilibrium. If real behavior does not comply with the perfect rationality envisaged by the models so that commuting is of less importance for a person’s utility, the entire theoreticalconceptual edifice collapses. However, this shortcoming can be partly remedied, however, if we acknowledge that the costs of transport to the center and the desire to reduce them may reflect other important aspects of an individual’s utility function when she/he makes location choices, like accessibility to information, recreational services, and opportunities for social interaction. A second shortcoming is more serious. These models concern themselves neither with how a city center is organized nor with what happens outside the city itself. They restrict themselves to interpreting locational behavior within the area extending between a hypothetical aspatial center and the physical boundary of the city. Moreover, when these models are used to interpret location equilibrium, not internally to a city but among cities, and therefore on the hypothesis that the city is part of

74

R. Capello

an urban system and that firms may decide to relocate to other cities with attractively higher levels of utility or profit, they display a clear interpretative weakness. On the hypothesis that firms have equal production functions, there can only be indifference to alternative locations in other cities if all these exhibit – in the logic applied here to describe them – the same bid-rent curve and the same boundary-rent curve and are therefore all of the same size. If this is the case, there will be an urban system made up of cities which are all of equal size, but this circumstance is amply contradicted in the real world. In order to deal with this defect, the conceptual framework should be able to accommodate the hypothesis that locational advantages differ according to the size of the city and that rents – the monetary counterpart of the advantages that firms obtain from central urban locations – vary (distance from the center remaining equal) from city to city. Only thus is it possible to conceive a location equilibrium with cities of different sizes. Yet this also requires acceptance of the idea that large, medium-sized, and small cities are structurally different and perform different functions in the overall economy and consequently have specific production specializations: a hypothesis at odds with the basic features of these models and which instead opens the way for the general equilibrium models discussed in chapter ▶ “Labor Market Theory and Models” of this handbook.

3

The Location of Industrial Activities in Space: Weber’s Model

3.1

Transportation Costs and Agglomeration Economies

The aim of other kinds of location models is to explain location choices of firms by considering the two great economic forces that organize activities in space: transportation costs and agglomeration economies. These forces push the location process in opposite directions since they simultaneously induce both the dispersion and the spatial concentration of production. It is because of agglomeration economies that spatial concentration comes about. Widely used in regional economics, the term “agglomeration economies” denotes all economic advantages accruing to firms from concentrated location close to other firms and result from the concentration of economic activities in space. However, there are two forces which work in the reverse direction and give rise to dispersed location. The first is the formation in the agglomeration area of increasing costs or diseconomies, these being (i) the prices of less mobile and scarcer factors (land and labor) and (ii) the congestion costs (noise and air pollution, crime, social malaise) distinctive of large agglomerations. These diseconomies are generated above a certain critical threshold. However, the second factor – transportation costs – is of greater interest because these costs countervail the spatial concentration of activities whatever level of agglomeration has been reached. For in conditions of perfect competition, perfectly mobile production factors, fixed raw materials, and demand perfectly distributed across the territory, the existence of transportation costs may

Classical Contributions: Von Thünen and Weber

75

erode the advantages of agglomeration until activities are geographically dispersed and the market becomes divided among firms, each of which caters to a local market. The way in which these two opposite forces can be taken at the same time into account is elegantly presented in Weber’s model, described hereafter. As we will see, unlike the location models described in Sect. 2, which identify just one factor organizing activity in space (land rent), Weber’s model envisages a different location equilibrium according to the spatial principle that patterns activities in space (agglomeration economies rather than minimum transportation costs).

3.2

Weber’s Model

One of the first and best-known studies on the spatial concentration of industry dates back to 1909. In that year, the economist Alfred Weber constructed an elegant location model where the costs of transportation among production site, raw materials markets, and the final goods market (which together define a minimum transportation cost) are directly compared against localization economies (Weber 1929). The prevalence of one element over another determines the geography of industry location. Weber’s model is based on the following simplifying assumptions: (a) There is a punctiform market for the good (C in Fig. 6a). (b) Two raw materials markets, these too punctiform, are located at a certain distance from each other (M1 and M2 in Fig. 6a). (c) There is perfect competition in the market, i.e., firms are unable to gain monopolistic advantages from their choice of location. (d) Demand for the final good is price inelastic. Demand is said to be inelastic when the price of a good changes but the quantity of the good demanded (or supplied) varies less than proportionally or remains the same. (e) The same production technique is used in every possible location; production costs are therefore given and constant. The location choice results from a complex calculation performed in two stages. In the first, the firm looks for the location that assures the minimum transportation cost between the production site, the raw materials market, and the final market for the good produced. In the second stage, the firm compares the advantages of agglomeration (localization economies) against the higher transportation costs that it would incur by choosing the new location instead of the one with minimum transportation costs. The first stage of calculation identifies the location that assures minimum transportation costs. Let x and y be the tonnes of raw materials present, respectively, in markets M1 and M2 and required to produce one unit of output, and let z be the tonnes of the finished good to be transported to the final market C. Total transportation costs (CT) are expressed as a function of the weight of the good to be transported and the distance to cover:

76

R. Capello

CT ¼ xa þ yb þ zc

ð5Þ

where a, b, and c are, respectively, the distances in kilometers between the raw materials markets and the production site and between the latter and the final market and xa, yb, and zc represent the “forces of attraction” that push the firm, respectively, toward points M1, M2, and C (Fig. 6a).

a

M1

M2 b

a P x

y

c

C

z

b A B C Agglomeration area Critical isodapanes D

Isodapane: geometric locus of points of constant additional cost of transportation with respect to the least-cost location Fig. 6 Weber’s location equilibrium. (a) The locational triangle: choosing the location with the minimum transportation costs. (b) The agglomeration areas. Legend: x and y tonnes of raw materials, z tonnes of final goods, M1, market of one raw material, M2 market of another raw material, C market of the final good, A, B, C and D; different firms

Classical Contributions: Von Thünen and Weber

77

The minimum cost location solution can be identified: • At a point inside the triangle formed by joining M1, M2, and C if none of the “forces of attraction” exceeds the sum of the other two. In economic terms, this situation occurs when the cost of transporting the z tonnes of the good 1 km further away from the outlet market is less than the costs of transporting the x and y tonnes of raw materials 1 km further away from their source market. • At corner C of the triangle, i.e., the final market, if the sum of the costs of transporting the x and y tonnes of raw materials 1 km further away from their market is less than the cost of transporting the z tonnes of final good produced one extra kilometer. This situation comes about because of the greater relative weight, in the composition of the finished product, of ubiquitous raw materials with respect to those that must be transported. Weber calls this condition “market oriented.” • At a point closer to the raw materials markets if the sum of the costs of transporting the x and y tonnes of raw materials 1 km more is greater than the extra cost of transporting the z tonnes of the finished good. This situation can be explained by the lesser relative weight, in the composition of the final good, of ubiquitous raw materials with respect to localized raw materials and/or the product’s loss of weight during the manufacturing process. Weber calls this location “raw-material oriented.” Weber provides a practical solution to the problem of identifying the minimum point. He hypothesizes a triangular board (the location triangle) in which three holes are drilled at the vertexes M1, M2, and C. Threads are passed through these holes (Fig. 6a), and their ends are knotted together on the upper surface of the board. Weights respectively proportional to x, y, and z are attached to the other ends of the threads below the board. The point at which the knot of the three threads lies on the upper surface of the board corresponds to the point of minimum transportation costs. In the second stage of the location choice process, the firm compares the least-cost location with an alternative one where it can enjoy localization economies – for instance, the availability of labor at lower cost and/or better quality. Assuming that P in Fig. 6a is the location point with the lowest transportation costs, Weber describes the “isodapanes”: curves along which the additional transportation cost that the firm must pay in order to cover a certain distance from the least-cost location remains constant. On the assumption that other firms operate in the same sector and that these firms obtain advantages from concentrated location such that they have a pecuniary advantage equal to v, the decision to relocate will be taken if and only if each firm’s isodapane measuring an extra transportation cost equal to the agglomerative advantage (v) intersects with the isodapanes of the other firms. In this case, in fact, within the area of intersection, the additional transportation costs are less than the advantages generated by concentrated location. In Fig. 6b, firms A, B, and C find themselves in this situation, and they relocate, but not so firm D, for which the agglomerative advantage is no greater than the additional transportation cost.

78

3.3

R. Capello

Critical Remarks of the Model

Weber’s model has made a permanent and major contribution to industrial location theory. Its principal merit is that it uses entirely rational modes of reasoning: for instance, comparison between the advantages of an alternative location and the additional transportation costs that it would generate. Nevertheless, the model has a number of shortcomings: • Its static nature. The model identifies the least-cost location on the basis of productive efficiency, but it ignores dynamic aspects such as innovation at the microeconomic level, while, at the macroeconomic one, it neglects changes in income distribution and in the relationships among agglomeration advantages, rents, and wages. • Its transport-oriented nature. The cost of transportation defines first and foremost the most efficient location; only subsequently does it identify alternative locations. Some critics have claimed that this approach is less efficient than one based on the direct search for a point of minimum total production cost. • Its abstractness, which makes the least-cost location difficult to calculate in real settings. It is rather unlikely, in fact, that the weight of raw materials in the final weight of the good can be calculated, distinguishing inter alia the weight of the raw materials to be transported from those present at the production site. • Its nature as a partial equilibrium model which entirely neglects possible interactions among firms. • Its supply-side bias. The criticism most frequently made of the model is that it is excessively oriented to the supply side: it makes no mention of demand factors, assuming that demand is unlimited and inelastic to price variations. The main strength of the model is that for the first time it opened the way to reflections on the importance of agglomeration economies in location choices. The importance and power of agglomeration economies in the real world is evident. The number of concentrations of economic activity in space is very high. Local districts or clusters of SMEs denote a local area with a strong concentration of small- and medium-sized firms, each specialized in one or a few phases of the production process (or activities subsidiary to it) serving the needs of the area’s principal sector. In this sense, they reflect Weber’s concept of localization economies, i.e., those advantages stemming from the physical proximities of firms belonging to the same sector. In the real world, firms do not only cluster to achieve increased static efficiency of their production processes (i.e., an increase in firms’ revenues or a decrease in their costs); firms get also dynamic efficiency in the form of increases in their innovative and creative capacity. Cases in point are Silicon Valley in California, “Route 128” in the Boston area, Baden-Württemberg in the South of Germany, Jutland in Denmark, Småland in Sweden, and Sophia Antipolis close to Nice, to cite only some examples. Weber’s model does not explain the existence of agglomeration economies and gives them for granted. However, it has to be recognized that the intuition to highlight them as an important element in the choice of firms’ location has opened

Classical Contributions: Von Thünen and Weber

79

the way to a long stream of theoretical reflections to interpret the formation of production agglomeration in space.

4

Conclusions

This chapter has surveyed the two groups of location theories developed to explain the determinants of location choices by industrial firms. First, it described models of a strictly neoclassical nature, which seek to account for the allocation of land between alternative activities within a spatial structure of uniform supply in space and a punctiform source of demand. High demand for access to central areas triggers competition among firms or between firms and households to obtain locations closer to the market, or more generally to a hypothetical central business district. Land rent is the main factor that organizes activities in urban space. According to strict economic logic, competition for land closer to the center is resolved by its allocation to activities able to pay higher rents. The virtues of these models are their rigor and their stringent economic logic. Their main weakness emerges when they set out to explain the location choices made by firms between cities with different levels of utility or profit. Indifference to alternative locations, which is the long-period equilibrium condition, is guaranteed if and only if cities offer the same utility and the same profit and therefore, according to the model’s logic, if and only if cities are of the same size. Yet this implies the existence of an urban system consisting of cities which are all of the same size – a circumstance widely belied by reality. In order to understand the economic reasons for the existence of urban systems with cities of different sizes, consideration must be made of the functional characteristics of cities. This is an aspect which the models described thus far are unable to handle and which is instead addressed by the models discussed in the next chapter. The second kind of models presented in this chapter is Weber’s model. On the hypothesis that demand and supply structures are punctiform in space, the model elegantly and convincingly explains the existence of territorial agglomerations on the basis of two great economic forces which induce either the concentration or the dispersion of activities in space: agglomeration economies on the one hand and transportation costs on the other. Still today, these forces are components of more modern, and in certain respects more complex, models which seek to conjugate location choices with local growth dynamics (see section “Location and Interaction” of this handbook), and it is on the balancing of them that the geographical organization itself of activities depends.

References Alonso W (1960) A theory of the urban land market. Pap Proc Reg Sci Assoc 6:149–157 Alonso W (1964a) Location theory. In: Friedmann J, Alonso W (eds) Regional development and planning: a reader. MIT Press, Cambridge, MA, pp 78–106

80

R. Capello

Alonso W (1964b) Location and land use: towards a general theory of land rent. Harvard University Press, Cambridge, MA Beckmann MJ (1969) On the distribution of urban rent and residential density. J Econ Theory 1(1):60–68 Cairncross F (1997) The death of distance. Harvard Business School Press, Cambridge Camagni R (1992) Economia Urbana: Principi e Modelli Teorici. La Nuova Italia, Rome Capello R (2007) Regional economics. Routledge, London Fujita M (1985) Urban economic theory: land use and city size. Cambridge University Press, Cambridge, MA Isard W (1956) Location and space-economy. MIT Press, Cambridge, MA Muth R (1968) Urban residential land and housing market. In: Perloff H, Wingo L (eds) Issues in urban economics. The Johns Hopkins Press, London, pp 285–333 Muth R (1969) Cities and housing. The University of Chicago Press, Chicago Ricardo D (1971) Principles of political taxonomy and taxation. Penguin Books, Hardmondsworth. (Orig edn 1817) von Thünen JH (1826) Der Isolierte Staat in Beziehung auf Landwirtschaft und Nationalökonomie. Puthes, Hamburg Weber A (1929) Alfred Weber’s theory of the location of industries. University of Chicago Press, Chicago. Über der Standort der Industrien. Verlag Mohr, Tübingen. (orig edn 1909) Wingo L (1961) Transportation and urban land. Resources for the Future, Washington, DC

Market Areas and Competing Firms: History in Perspective Folke Snickars

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Theoretical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Theoretical Modeling of Location Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Basic Modeling Principles and Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Experimenting with the Hotelling Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Analysis of the Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

82 84 87 89 92 93 96 97

Abstract

Location theory has traditionally been based on equilibrium concepts. Dynamics have been introduced mainly to ascertain whether there are paths leading to the equilibrium states. The modeling of dynamics has been very simple yet involving both locational changes and price changes. Notions of market areas and competition between firms have been at the core of location analysis. Although the classical location theory was developed in a regional context, the models have found a number of recent applications in urban analysis where interdependencies and dynamics are central elements. The theoretical contributions of Hotelling, Hoover, and Palander form cornerstones for the discussion in the current chapter. In this chapter, we will mainly dwell in the Hotelling tradition and use the theories of Hoover and Palander as introductory and complementary inputs. The chapter presents a series of behavioral models in the spirit of the classical Hotelling location game involving the spatial location of suppliers (sellers) and consumers (customers) in an urban context. The models have been established within a F. Snickars (*) Department of Urban Planning and the Environment, KTH Royal Institute of Technology, Stockholm, Sweden e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_95

81

82

F. Snickars

cellular automata framework. The location models studied assume fixed prices. The location of sellers is determined by the relative accessibility to customers and the competition between sellers for customers. Using the techniques of cellular automata, a set of simulations will be performed to discuss equilibrium states of customer-seller systems. The discussion will serve to illustrate some elements of location theory under different levels of complexity. Keywords

Market share · Cellular automaton · Location theory · Urban system · Relative accessibility

1

Introduction

Some scientific articles have become classics in their field. This is the case with the theoretical contributions of Hotelling, Hoover, and Palander. Their works form one of the cornerstones for the discussion in the current chapter. We will focus mainly on the Hotelling tradition and use the theories of Hoover and Palander as introductory and complementary inputs. Techniques of cellular automata are used to investigate how fundamental principles of dynamics and evolution which replicate the theoretical results of locational analysis can be applied also in more complex spatial arrangements. This type of analysis has been demonstrated useful in studying a wide range of dynamic systems, including spatial urban systems (White and Engelen 1997; Semboloni 2000), political systems (Downs 1957), and innovation systems (see, e.g., Rasmusen 1989; Leydersdorff 2002). It is common to consider industrial location within conventional general equilibrium theory, in which everything is assumed to happen at one point in space. Two fundamental questions have to be distinguished: Where will production take place? And given the place of production, the competitive conditions, factory costs, and transportation rates, how does price affect the extent of the area in which a certain producer can sell his goods? One of the fundamental research issues in location theory is the boundary of the market areas of spatially located firms. The simple case of two firms making the same product for linear markets, where consumers are uniformly distributed on a line or along a street, was developed among others by Palander (1935). Palander argues that the price charged at a certain location (the delivered price) is the plant cost measured as the price charged for the product at source plus the necessary cost of transportation to the fixed location from the plant. This is illustrated in Fig. 1. The boundary of the two markets will be the point where the delivered price from both producers is equal. This is the point where customers will be indifferent as to which firm they buy from. The size of the market area influences the profit. With the cost of production and profit per unit of output given, total profits become a function of the distance from the plant that a firm can extend its market.

Market Areas and Competing Firms: History in Perspective Fig. 1 The classical result of Palander’s market area theory

83

P (price) Gradients of delivered price Freight Cost A Price at Plant

B

A Market of A

X

B

D (distance)

Market of B

The assumption of perfect competition was also used by Hoover (1938, 1948). Using the same setting as developed by Palander and introducing the condition of diminishing returns to scale in the production function of the firm, Hoover arrives at the conclusion that in the absence of production cost differences, the best location will be at the point of minimum transport costs (see Fig. 2). Market area boundaries between different producers arise from areal variations in production costs and delivered prices. In the early literature, market areas were separated from each other for firms having a fixed set of locations. The early location theorists found the problem of optimum location for the individual firm intractable. As soon as the interdependence of firms is accepted with the possibility that the action of one firm in locating itself can require the relocation of existing firms, the problem became too complex for mathematical formulations. The third tradition from the classical location theory was the spatial competition analysis put forward by Hotelling (1929). That analysis differed in at least two respects from the other theorists. The first was that location was not given but a matter of competition. The other was that the analysis was dynamic or could at least be interpreted in a dynamic way. The analysis was aimed at providing a theory which would apply to industries which would choose their location to compete for the best position. The problem of location of interdependent firms in space was formulated and solved in strict theoretical terms by Koopmans and Beckmann (1957). They also showed that there was no price system which would sustain the equilibrium location in their quadratic assignment problem. It was shown by Heffley (1972) and Snickars (1978) that the reason for this complexity was the non-convexity of the quadratic assignment matrix. The nature of transportation costs implied that the matrices representing the assignment parameters, in essence, would never be diagonal dominant. The following discussion will build on the tradition of Hotelling (1929) and combine it with game theory and the theory of cellular automata to address the

84

F. Snickars

Fig. 2 The classical result of the Hoover location theory

Margin = delivered price lines

Price

c⬘ c b a

a⬘

b⬘

X A B Xa: production cost with market area XA

C

Y Distance

question of spatial competition involving many competitors and several different demand schemes and competitive conditions. One reason for this choice of approach is that it is important to follow the dynamics of the system and thus to frame the classical theories in a modern theoretical and computational context.

2

Theoretical Background

Location theory has traditionally been based on equilibrium concepts. Dynamics have been introduced mainly to ascertain whether there are paths leading to the equilibrium states. The modeling of dynamics has been simple yet involving both locational changes and price changes. The current cellular automata framework is based on the assumption that complex spatially defined phenomena can be modeled by treating dynamics explicitly. Complexity will arise from the interaction among actors rather than from the behavioral assumptions for each actor. Cellular automata use a grid of cells to describe the spatial dimensions of the system and an incremental stepwise analysis of all cells to approximate the temporal dimension. This type of analysis has been demonstrated to be useful in a wide range of dynamic systems, but only recently the concepts are being applied to urban systems. This contribution presents a series of behavioral models in the spirit of the classical Hotelling location game involving the spatial location of suppliers (sellers) and consumers (customers) in an urban systems context. The location models in this chapter assume fixed prices at the factory gate. This does not preclude the possibility that these prices will vary among plants. Obviously, it will be beneficial for a firm to have lower production costs which will be reflected in the prices at the factory gate. The location of sellers is determined by the relative accessibility to customers and the competition between sellers for customers. The cellular automata approach used to investigate how fundamental principles of dynamics and evolution which replicate the theoretical results of locational analysis

Market Areas and Competing Firms: History in Perspective

85 Isotims (equal delivered price)

8 7 10

6

9

5 4

8 7 6

Market area boundaries

B

5 4

3 3

4

5

6

7

8

C

2 1

A

Fig. 3 Classical picture of the development of market areas in two dimensions

can be applied also in more complex spatial arrangements; see Fig. 3 for an example of two-dimensional location-theoretical results. The dynamic simulations may also be used for predictive purposes to determine the equilibrium states of a customer-seller system at some future point in time. They may also provide an insight into the dynamic behavior of customer-seller systems approaching the spatial complexity of real urban areas. Usually, these processes proceed concurrently and may, or may not, be directly related to one another. The speed of the processes will differ as will the spatial extension of impacts among agents involve in them. It is difficult to provide a single approach for exploring the fundamental issues in the dynamics and spatial evolution of the urban system. A wide range of different approaches is emerging derived from the use of artificial intelligence, multiagentbased models, cellular automata, network analysis, dynamic programming, queuing theory, game theory, stochastic simulation, and several other mathematical modeling techniques useful for urban analysis (see also Wegener 2004; Batty 2008). In the model treated, the simulation process operates on urban activities using sets of rules for spatial interaction among these activities, including environmental and other constraints. The set of urban activities (workplaces, residential districts, green areas, water surfaces) will vary in different investigations, depending upon the research goals, the level of aggregation required, and what interacting mechanisms

86

F. Snickars

of change among agents to be considered. The set of rules and constraints for spatial interaction may be more or less well defined and will be modeled using nonlinear system methods, differential equations, or procedural knowledge. The processes will need to be spatially and temporally defined to model the inherent dynamics of multiagent urban systems. In this chapter, these methods are used to develop a class of dynamic land-use models (see also Anas 1987; White and Engelen 1993; Roy and Snickars 1996, 1998; Wegener 2004; Batty 2008). Specifically, we will be examining a customer-seller problem in the spirit of the classical Hotelling location game to demonstrate these principles. In the cellular automata approach adopted in the chapter, the spatial dimension is represented by a set of cells covering the region to be studied. These cells are often based on a regular grid, and while this is not a fundamental requirement, it does simplify computations involving distance and adjacency. The dynamic state of each cell is determined by its initial state and the dynamics of the states in neighborhood cells. Many of the properties of the cellular automata will be determined by the scope of the neighborhood and the rules of interaction among neighboring cells. In addition to the rules of interaction among neighboring cells, there will be system-wide constraints and changes imposed by externalities outside the scope of the urban system itself. The constraints will often be employed to introduce different types of public policies to address the externalities caused by the interaction among urban agents. It is also possible to introduce agglomeration factors in the framework by increasing the attractiveness of clustering spatial location elements around already existing clusters. The resulting modeling framework is rather simple and can be readily modified to add new interaction mechanisms while maintaining the same model structure. We still need to be concerned about the integrity of the modeling process to ensure that it adequately represents the interaction mechanisms we wish to include. The chapter is concerned with attempting to show, for the case of simple models, that we can obtain results consistent with classical theories. We will then extend the analysis with the help of our model to more complex situations where theoretical analysis will not be able to give closed-form results. A further ambition is to compare the equilibrium outcomes in a cellular automata framework to the ones derived from static equilibrium theory. Similar land-use models based on cellular automata have been proposed by, for instance, Roy and Snickars (1996), White and Engelen (1993, 1997), Semboloni (2000), and Liu (2009). Most of them are intended for investigation of basic questions of emergent urban form rather than to provide simulations of the spatial development of particular cities. The contributions suggest different behavioral processes for the representation of urban activities. The common principles for all models are that the spatial structure of urban land-use is approximated by a regular grid of cells, each cell representing a single type of land use. According to specified sets of transformation rules, the models convert cells from one state to another and so produce fractal or bi-fractal land-use structures for the urbanized area and for each land-use type. Transformation rules are generally simple and yet can produce highly organized, complex, evolving structures. The set of rules

Market Areas and Competing Firms: History in Perspective

87

may be divided into those that permit an unused cell to be set to a certain state and those permitting cells to become locked in a particular state or to allow displacement of one land use by another. A set of weighting parameters is used to represent the relative competitiveness of various land-use activities. These weighting parameters control the spatial and sectoral patterns of interaction. The choice of weighting factors is thus a part of the research investigation: to discover where and when particular values have significant influences on system behavior. The simulation process begins from a predefined initial state for the land-use pattern and through a series of transformations that evolve into new states. The transformation rules for cells can be written quite generally, but they typically rely on the present state and the activities in neighboring cells to determine the subsequent state of the cell. The neighborhood concept may not necessarily be based on geographical proximity. It can also relate to linkages in relation to economic clusters of sectors of the economy or clusters in the sense of industrial districts. The model suggested by Roy and Snickars (1996) is more general in its approach to the simulation of urban system dynamics because it does not impose restrictions on the character of neighborhood bounds. Furthermore, it does not impose a predetermined development cycle where one land-use type has externally determined predominance to another.

3

Theoretical Modeling of Location Choices

As mentioned earlier, the model of location choice of sellers in an urban system is inspired by the classical article by Hotelling (1929). The topic has been treated extensively in later location theory work, see e.g., Gabszewicz et al. (1986), Thisse et al. (1996), Fujita et al. (1999) and Nickel and Puerto (2005). A seminal contribution was made by d’Aspremont et al. (1979) where results were proved for general customer-seller configurations using methods from game theory. The Hotelling location analysis addressed the question of competition in space by letting sellers compete for customers both with their choice of location and with their product price. The question was what would be the equilibrium location and price configuration for different assumptions about customer behavior and schemes of cooperation among sellers. In the Hotelling location game, prices are fixed and the firms choose the most appropriate locations for their activities. Consumers or customers are distributed along the interval [0,1] with a uniform density equal to one. The prices equal one, and production costs are zero. The players in the Hotelling game are sellers, say a, who simultaneously choose locations x(a)  [0,1]. They are ordered by their location x(1) < x(2) < . . . < x(n), x(0) = 0 and x(n + 1) = 1. Seller number, a, attracts half of the customers on the gaps on each side of him so that his payoff is (x (a)  x(a  1))/2 + (x(a + 1)  x(a))/2. The existence of an equilibrium in the Hotelling location game was investigated for homogeneous and discriminate pricing, elastic and inelastic demand, independent and interacting products and bundles of products, non-differentiated and

88

F. Snickars

differentiated consumers, various distributions of consumers, different models of distance, and different numbers of sellers (see Gabszewicz et al. (1986) for an overview of results). Eaton and Lipsey (1975) proved the existence of equilibrium in pure strategies with more than three sellers. Dasgupta and Maskin (1986a, b) and Simon (1987) investigated equilibrium properties in mixed strategies for any number of sellers in a space of any dimension. The main results are that there exist equilibria for a wide variety of assumptions. The case of three sellers seems to be an exception. In that particular case, there was a game of musical chairs in which there was always an incentive for a seller to change location among a limited set of cells. In this chapter, we investigate a number of related modeling frameworks to illustrate the complexities of the Hotelling location game. The different models can be related to the classical questions in location theory formulated by Palander (1935), Hoover (1948), and others. Essentially, the setup means that space is subdivided into discrete cells. In the framework of the Hotelling classical ice-cream vendor problem, we think of a beach with people assigned to predetermined chairs or sunshades. In the game theory framework, the problems we will primarily address are what will happen when new sellers arrive at the marketplace, i.e., at the linear beach: Will there be a single stable equilibrium, or will there exist cycles with sellers roaming the beach in search of ice-cream buyers? What patterns will emerge in a situation when some sellers are fixed in space and others are mobile? What patterns will emerge under different assumptions about the mechanisms of interaction among sellers? There will be homogeneous sellers and buyers, and the products will not be differentiated. Generalizations are not pursued here as they would cloud the results of the behaviors we are trying to model. We illustrate how spatial patterns emerge from different assumptions with the help of a specifically developed model; see Roy et al. (2000) for a full description of the analyses. Our basic problem concerns the operation of a market with multiple sellers and customers where the behavior of the sellers (e.g., ice-cream vendors) is driven by the objective to maximize market share and the behavior of the customers (e.g., visitors of the beach) is to maximize accessibility to purchase the product for sale (i.e., ice creams). Since the behavior of sellers and customers is not complementary, we will consider a total of six frameworks, each one intended to model a particular behavioral paradigm. There are two different market types and three optimization strategies. The first optimization strategy will take a seller perspective, specifically optimizing for the last seller to enter the market (sellers do not cooperate). The second takes the customer perspective thus optimizing the total benefit to all customers. For the first market type, customers are assumed to only use the most accessible (or closest) seller. We will call this a closed market. In the second case, customers will share their purchasing among all sellers in proportion to their relative accessibility. We will call this an open market. The relationships between customer and seller behavior are shown in Table 1.

Market Areas and Competing Firms: History in Perspective

89

Table 1 The considered location models and market types

Closed market Customers use nearest seller only Open market Customers use accessible sellers

Seller perspective Noncooperative game Model 1

Customer perspective Welfare maximization Model 2

Model 3

Model 4

In Model 1, sellers are locating to maximize their market share and do not cooperate in sharing the market. The market share for a new seller is determined by maximizing the number of customers who are closer to the new seller than to each of the other sellers. We assume that a new seller entering the market has only the choice of location to maximize the market share; no other mechanisms are possible (e.g., price or product differentiation). In Model 2, sellers are locating to maximize the (total) accessibility to all customers, with each customer choosing to use the nearest seller only. From the customers’ view, a seller which is closer offers additional benefit. The accessibility is estimated by a negative exponential distance decay function. In Model 3, sellers locate to optimize their market share but do not cooperate in sharing the market. The market share is being determined by the accessibility of sellers to customers. It is assumed that sellers can attract a proportion of all customers, depending on the relative accessibility of sellers to customers. In Model 4, sellers locate to maximize the total benefit to all customers. Customers will share their purchasing with all sellers in proportion to their relative accessibility as measured by a negative exponential distance decay function. In the context of cellular automata, we will consider our market to be defined over a grid of cells, each occupied by a seller or customer (or being empty). From some initial state, we will examine how the system evolves as more sellers are added to the system. There are thus two cases to consider. The first assumes that once located sellers do not relocate. This framework attempts to model urban systems in a development phase or systems in which some seller units are fixed and others mobile. The second case assumes that after the addition of a new seller, there is some time period during which the urban system adjusts toward an equilibrium state which is facilitated by allowing all sellers to change location to improve their relative payoffs. While these modeling frameworks represent just a small sample of possible options and behavioral assumptions, they will permit us to see how the cellular automata handle the different situations. Our objective is to demonstrate that a cellular automata approach can produce results consistent with what we should expect from classical theory or intuitively from a behavioral analysis.

4

Basic Modeling Principles and Assumptions

As the basis for modeling, we take a spatial framework based on a regular grid of cells, (see also Roy et al. 2000). Each cell represents a unit of space which may contain some particular urban activity. The spatial arrangement of cells reflects the

90

F. Snickars

spatial organization of a seller-customer system. One might consider two basic spatial arrangements. One is a classical one-dimensional model where the customers are located along a line and the other is a two-dimensional problem with a square array of customers. The analysis permits a range of cell types and the specification of several operational parameters (e.g., accessibility indices, competition factors, allocation sequences, clustering mechanisms, and distance metrics). Each cell will only be allocated to one seller, but a seller may be located in the same cell as a customer. The assumptions we will make for the purpose of comparability are as follows: Distances are Euclidean and measured from cell center to cell center. The negative exponential distance attenuation parameter is taken as 1.0. In the one dimension, sellers locate along the edge of a line of customers. In two dimensions, sellers may overlay customers within occupied cells. We begin with some general definitions: A is the set of sellers, a being any seller, and b a new seller; a,b  A. S is the set of customers, s being any customer, s  S. a, b, and s represent locations, x(a), x(b), and x(s). d(s,a) represents the distance from a customer at cell s to a seller at cell a. W(a) represents the attractiveness of cell a for a new seller. W is the attractiveness summed over all cells of the total system. Model 1: Closed market, seller perspective and noncooperative game Given that a system exists with a number of sellers and customers, an additional seller will locate at cell a when this cell maximizes the market share for the new seller. Sellers do not cooperate in any way. The market share is determined from the number of customers closer to k than any other seller. Hence a will be chosen as follows: W ¼ max WðaÞ, across all a

ð1Þ

WðaÞ ¼ sum dðs, aÞ, across all s dðs, aÞ ¼ 1, if dðs, aÞ < min dðs, bÞ, across all b other than a dðs, aÞ ¼ 0, otherwise The new seller adopts a selfish view, attempting to claim as much of the market share as possible, knowing that if she locates so that the cell is closer to a customer than any other seller, then she will claim all the purchases of that customer. Naturally, the other sellers will not be content with the situation and, if possible, attempt to relocate to reclaim some of their lost market shares. Model 2: Closed market, customer perspective and welfare maximization This model takes the customer perspective. The location of a new seller is taken to maximize the accessibility of customers to sellers. Customers do not care which

Market Areas and Competing Firms: History in Perspective

91

seller they use, but they will choose the closest and will (collectively) be more satisfied if the total accessibility is maximized. The attractiveness W is defined as follows (μ is a distance attenuation parameter): W ¼ max WðaÞ, across all a

ð2Þ

WðaÞ ¼ sum ðexp ðμ dðs, aÞÞÞ, across all s dðs, aÞ ¼ min dðs, bÞ, across all b This implies that the new seller will be located at cell a, which results in the total accessibility for all customers being maximized. Each customer chooses the closest seller exclusively for their purchases. Allowing existing sellers to relocate in response to this new seller entering the system may result in further improvements to the collective payoff to customers. Sellers placed closer to customers are considered more beneficial in accordance with the posited distance attenuation function. Model 3: Open market, seller perspective and noncooperative game Here we assume that customers are prepared to share their custom over all sellers in proportion to their relative accessibility to sellers. When a new seller locates (at cell a), he/she can count on capturing a proportion of all customers’ business and so attempts to maximize this share. The proportion is assumed to be based on the relative accessibility of sellers to customers as computed from a standard distancebased accessibility measure. The new seller is thus located at cell a so that W ¼ max WðaÞ, across all a

ð3Þ

WðaÞ ¼ sumðexp ðμ dðs, aÞÞ=sumðexp ðμ dðs, bÞÞ across all b and s As with Model 1, the location of a new seller will most probably reduce the market share of the remaining sellers, who may then wish to relocate in an attempt to minimize this loss. In the accessibility case, the total purchases from the customers will depend on the number of sellers unlike in the closed market case when the total demand in the system will stay the same irrespective of the total number of sellers and their locations. Model 4: Open market, customer perspective and welfare maximization In this final model, the location of the new seller is taken to maximize the collective payoff to all customers assuming that customers will share their purchasing power with all sellers in proportion to the relative accessibility of sellers to customers. In this case, we have, therefore, the new seller being located at cell a so that max WðaÞ ¼ sumðexp ðμ dðs, aÞÞ, across all s

ð4Þ

As with the previous models, if existing sellers are permitted to relocate, further improvements in total customer payoffs may be possible.

92

5

F. Snickars

Experimenting with the Hotelling Model

To study the behavior of these models, an experimental test bed will be established. It may be, for instance, that the end result is path-dependent and thus influenced by the initial spatial distribution of the sellers. There could also be deviations between theory and practice because of our decision to use discrete cells rather than a continuum of possible locations (see also Puu 2003). We are interested in comparing the cellular automata results with the classical theories as described earlier in the chapter. Our linear region must be of finite size to be computationally manageable. This fact will naturally introduce some edge effects due the size of the system. We expect these will not cause fundamental problems, providing the models are sufficiently large (i.e., have a large enough number of cells) relative to the number of sellers added to the system. The likely effect of a small-size sample in this case will be that occasionally payoffs will not vary across location cells. We will therefore expect there to emerge several equilibrium patterns of location. This is also confirmed from performing theoretical experiments with the models in the framework of game theory. In these experiments which have been done for the one-dimensional case, the result is that the best-response functions of the sellers will contain sets of locations with equal payoffs; see the theoretical considerations in, for instance, Rasmusen (1989). A best-response function reveals the best response of one seller given the locations of all other sellers. Since these best responses are set valued, it is to be expected that the simulations will indicate the existence of several possible equilibrium situations or situations in which sellers cycle between different locations. An analysis has been performed to compute the equilibrium state in mixed strategies for some simple cases of the model. They show that already in the case of two sellers, the classical Hotelling solution will not always appear simply as a result of the existence of several best-response locations under the metric used. The linear model consists of a line of 20 customer cells, with (initially) one seller located at the fourth cell from the left (see the sequence of figures below). Three more sellers are then allocated to the system. The displays show the location of sellers, assuming all other cells are housing customers. We consider two cases, one where sellers are fixed and cannot relocate once making an initial decision and one where they can and, indeed, generally do relocate. As a sensitivity test, simulations are performed also for other initial positions of the first seller. In the first case, a single seller is placed in the fourth slot, and the positions for the second and subsequent allocations are computed for the given system state. In the second case, the initial position for the new seller is computed and the seller allocated. Then the positions of each seller are reviewed (in turn), and the sellers are relocated if better positions can be found. This relocation process is repeated iteratively until an equilibrium is obtained (i.e., no locational changes for any sellers can improve his/her individual payoffs). In some cases, a unique equilibrium is not obtained as the allocation pattern cycles through a sequence of cells. In Figs. 4, 5, 6, and 7, the seller positions are shown shaded. Where a final stable equilibrium state is not found, the sellers tend to cycle through a number of states, the

Market Areas and Competing Firms: History in Perspective

93

a No relocation allowed 1

2

3

4

5

6

7

8

9

10 11 12 13 14 15 16 17 18 19 20

6

7

8

9

10 11 12 13 14 15 16 17 18 19 20

b Relocation allowed 1

2

3

4

5

Fig. 4 Closed market, seller perspective and noncooperative game (Model 1): (a) sellers fixed after initial allocation and (b) sellers reallocated after new seller entered. The vertical dimension represents end situation after additional sellers have entered. The hashed areas represent cells in which cycles occur

range of which is shown by the more lightly shaded cells. The darker shaded cells show a typical (but no equilibrium) state. The lack of convergence is not unexpected as we are dealing with a system with discrete spatial positions and a relatively limited number of customers compared to sellers.

6

Analysis of the Simulation Results

We will now discuss the results in relation to a set of figures showing the end result of a number of simulations. The figures are organized so that each row represents the end state of the system with one, two, three, and four sellers. Note also that the market is represented by a one-dimensional array of 20 cells, each one of which can house one seller at a time. Model 1 represents a selfish strategy for each new seller, implying that sellers do not cooperate to share the market. This can be seen in Fig. 4a. Each new seller takes a position immediately to the right of the previously allocated seller, thus claiming the market for all customers to her right. The other sellers naturally lose market share, and when allowed to relocate, as shown in Fig. 4b, they cluster at the center of the line. The second row shows the classical Hotelling result. The states for three or more sellers do not build up to a stable equilibrium since one, or more, of the sellers will always see a way of improving their market share (there are no relocation costs

94

F. Snickars

in our model). This is in line with the theoretical developments offered by, e.g., d’Aspremont et al. (1979). It may be noted that the cycling in the fourth row will keep the four sellers as close to one another as possible in the middle of the market. The simulation shows, however, that the outermost spatial position will be challenged leading to some likelihood that this cell will also be occupied. Model 2 takes the customer perspective, and thus we would expect to see the distribution of sellers optimized to suit the customers. This result can be seen in Fig. 5. When sellers do not relocate, Fig. 5a, the new seller locates in a way to split the longest line of sellers in half (approximately, of course). This is more clearly seen in Fig. 5b when sellers are allowed to relocate. This model gives results which seem to be directly following from the Hotelling problem. The welfare maximizing spatial pattern is such that the sellers cover the market rather than crowding toward the center of the joint market. The ultimate spacing is, of course, influenced by the fact that the market covers exactly 20 cells. It is to be noted that no instabilities occur in this model. Model 3 takes a seller perspective, but this time opening the market so that we assume that customers will share their purchasing power among all sellers, but in accordance to their relative accessibility to each. The sellers do not cooperate to share the market. The result is shown in Fig. 6. The results are similar to the closed market case (Model 1) but with the sellers being more spatially distributed. This is to be expected as the sellers share the customers’ market, making the choice of location less sensitive to claiming customers from existing sellers. The sellers can take customers from each other without piggybacking each other at the middle of the market.

a No relocation allowed 1

2

3

4

5

6

7

8

9

10 11 12 13 14 15 16 17 18 19 20

6

7

8

9

10 11 12 13 14 15 16 17 18 19 20

b Relocation allowed 1

2

3

4

5

Fig. 5 Closed market, customer perspective and welfare maximization (Model 2): (a) sellers fixed after initial allocation and (b) sellers reallocated after new seller located. The vertical dimension represents end situation after additional sellers have entered

Market Areas and Competing Firms: History in Perspective

95

a No relocation allowed 1

2

3

4

5

6

7

8

9

10 11 12 13

14 15 16

17 18 19 20

6

7

8

9

10 11 12 13

14 15 16

17 18 19 20

b Relocation allowed 1

2

3

4

5

Fig. 6 Open market, seller perspective and noncooperative game (Model 3): (a) sellers fixed after initial allocation and (b) sellers reallocated after new seller located. The vertical dimension represents end situation after additional sellers have entered. The hashed areas represent cells in which cycles occur

Model 4 takes the customer perspective with the open market strategy. In the one-dimensional market case shown in Fig. 7, the results are quite clear. Sellers locate to maximize the total accessibility to customers. Since, from the customers view, the sellers are not competing with each other, the sellers tend to congregate toward the center of the linear system space. If one compares the end result for the closed market case with the open market one, one can observe that the spatial patterns seem to be more concentrated out in the open market case than in the closed market one where sellers one access the nearest customers. The result will be sensitive to the choice of distance decay parameter to illustrate the attenuation of demand. The simulation results above have been developed for the case of a linear market. Let us assume that the spatial competition takes place on a square with 10 cells in each direction and that the first seller is placed in cell (1,1). The end point for the noncooperative case in which sellers cannot relocate will be that late-coming sellers will take over the market by placing themselves on the diagonal from that corner. If the sellers are allowed to be relocated, they will cycle around the central cell (5,5). Thus, they will not be placed only in the middle area made up by the most central cells. One reason for this is the fact that the total market is limited in all directions. In the case of welfare maximization, again starting with a seller in cell (1,1), the end result in the relocation case will be as expected that each seller will create a local monopoly at each of the four corners. In the case of fixed locations, the result is similar although influenced by the starting location of the first seller. The end results in the open market case are spatially more complex with cycles occurring.

96

F. Snickars

a No relocation allowed 1

2

3

4

5

6

7

8

9

10 11 12 13

14 15 16 17 18 19 20

6

7

8

9

10 11 12 13 14 15 16 17 18 19 20

b Relocation allowed 1

2

3

4

5

Fig. 7 Open market, customer perspective and welfare maximization (Model 4): (a) sellers fixed after initial allocation and (b) sellers reallocated after new seller entered. The vertical dimension represents end situation after additional sellers have entered

A general conclusion from the experiments is that in simple cases, the theoretical results will be replicated. However, when complexity increases, the classical theories will lose some of their predictive power. The end results will be path-dependent, i.e., be different depending on the starting position of the first seller. The resulting patterns of spatial competition are not always stable, but sellers seem to cycle between a limited set of cells. This indicates the complexity of spatial competition and makes it necessary to develop more complex urban simulation models to attain further predictive power.

7

Conclusions

The modeling process by which the above results have been generated is quite simple. At the same time, the results are closely related to the classical problems of location theory posed by Hotelling, Hoover, and Palander. What we have illustrated is that we can replicate these results with good precision and efficiency using modern computational techniques. We have also demonstrated that modern theoretical work can get a substantial support for combining strict mathematical modeling with computer simulations. Finally, we have demonstrated the capacity to add complexity stepwise to show how further introduction of behavioral realism will affect the results both in terms of end-point equilibria and in terms of development paths in spatially competitive settings. It is also possible to use these methods to illustrate in a pedagogical way how different parameter settings will change the resulting spatial patterns.

Market Areas and Competing Firms: History in Perspective

97

The process of integrating different modeling frameworks into the cellular automata context is straightforward. We need to define the appropriate accessibility function and the proper selection criteria for location choice of subsequent sellers. This function is applied to all cells (or at least to a sufficiently large number) to cover the region of interest. The state of the cell meeting the selection criteria is then designated to be a seller location. While the cellular automata framework facilitates a whole range of more complex analyses that might affect the choice of location, we have not included these here. For example, it is straightforward to introduce issues of price competition, cost of relocation, policy restrictions, clustering, and other nonmarket factors (e.g., collusion among a subset of sellers). The recent literature abounds with examples of such attempts as showed, for instance, by the reviews of Wegener (2004) and Batty (2008). As a result of our modeling framework, we gain an immediate impression of the emerging spatial and temporal organization of the spatial competition system. We can observe the changes as they are computed and quite readily see if the behavior is in line with what might be intuitively expected. As a result, we can create an opportunity to explore a range of initial states and to evaluate the sensitivities of various modeling parameters on the simulation outcomes. Such a capability is a useful addition to available tools for the analysis of the dynamics of complex urban and regional systems. It appears from our studies that cellular automata can produce results which seem plausible and demonstrate behaviors that we might have expected under classical theoretical assumptions. They also replicate the theoretical results in cases where strict comparisons can be made. This might be of value in studying the dynamics of complex processes and learning more about how they impact on the form and composition of urban regions. The cellular automata are conceptual tool that can be used in teaching or in visualizing behavioral dynamics. In this mode of use, the method will accompany theoretical analyses of urban systems, strengthening the theoretical insight about the behavior of urban agents. The computations require information about the temporal state of cells and the ability to compute properties about neighboring cells. Such computations, as well as the graphical display of the cells with their properties, are generally well handled in geographic information systems which contain the spatial information. A limiting factor to treat different dynamic processes interdependently will be the computational speed available. Accessibility computations like those used here are very sensitive to the number of active cells in the system. Implementations will thus require some care to optimize computational processes even with modern computing speeds.

References Anas A (1987) Modelling in urban and regional economics. Harwood Academic, Chur Batty M (2008) Fifty years of urban modelling: macro statics to micro dynamics. In: Albeverio S, Andrey D, Giordano P, Vancheri A (eds) The dynamics of complex urban systems: an interdisciplinary approach. Physica, Heidelberg, pp 1–20

98

F. Snickars

d’Aspremont C, Gabszewicz J, Thisse J (1979) On hotelling’s stability of competition. Econometrica 47(5):1145–1150 Dasgupta P, Maskin E (1986a) The existence of equilibrium in discontinuous economic games I: theory. Rev Econ Stud 51(1):1–27 Dasgupta P, Maskin E (1986b) The existence of equilibrium in discontinuous economic games II: applications. Rev Econ Stud 51(1):27–41 Downs A (1957) An economic theory of democracy. Harper and Row, New York Eaton BC, Lipsey R (1975) The principles of minimum differentiation reconsidered: some new developments in the theory of spatial competition. Rev Econ Stud 42(1):27–49 Fujita M, Krugman P, Venables A (1999) The spatial economy: cities, religions and international trade. MIT Press, Cambridge, MA Gabszewicz J, Thisse J, Fujita M, Schweizer U (1986) Location theory. Harwood, New York Heffley D (1972) The quadratic assignment problem: a note. Econometrica 40(6):1155–1162 Hoover EM (1938) Location theory and the shoe and leather industry. Harvard University Press, Cambridge, MA Hoover EM (1948) The location of economic activity. McGraw Hill, New York Hotelling H (1929) Stability of competition. Econ J 39(153):41–57 Koopmans TC, Beckmann M (1957) Assignment problems and the location of economic activities. Econometrica 25(1):53–76 Leydersdorff L (2002) The complex dynamics of technological innovation: a comparison of models using cellular automata. Syst Res Behav Sci 19(6):563–575 Liu Y (2009) Modelling urban development with geographical information systems and cellular automata. Taylor and Francis, Boca Raton Nickel E, Puerto J (2005) Location theory: a unified approach. Springer, Berlin/Heidelberg Palander T (1935) Beiträge zur standortstheorie. Almqvist & Wiksell, Uppsala Puu T (2003) Mathematical location and land use theory: an introduction. Springer, Berlin/ Heidelberg/New York Rasmusen E (1989) Games and information: an introduction to game theory. Blackwell, Oxford Roy GG, Snickars F (1996) City life: a study of cellular automata in urban dynamics. In: Fischer M, Scholten H, Unwin D (eds) Spatial analytical perspectives on GIS. Unwin and Hyman, London, pp 213–228 Roy GG, Snickars F (1998) An interactive computer system for land-use transport interaction. In: Lundqvist L, Mattsson L-G, Kim TJ (eds) Network infrastructure and the urban environment: advances in spatial systems modelling. Springer, Berlin/Heidelberg/New York, pp 350–370 Roy GG, Snickars F, Zaitseva G (2000) Simulation modelling of location choices in urban systems. In: Fotheringham AS, Wegener M (eds) Spatial models and GIS: new potentials and new models. Taylor and Francis, London, pp 185–201 Semboloni F (2000) The growth of an urban cluster into a dynamic self-modifying spatial pattern. Environ Plan B Plan Design 27(4):549–564 Simon L (1987) Games with discontinuous payoffs. Rev Econ Stud 54(4):569–598 Snickars F (1978) Convexity and duality properties of a quadratic intraregional location model. Reg Sci Urban Econ 7(4):5–19 Thisse JF, Button K, Nijkamp P (1996) Modern classics in regional science: location theory. Edward Elgar, London Wegener M (2004) Overview of land-use transport models. In: Hensher DA, Button K (eds) Transport geography and spatial systems, handbook 5 of the handbook in transport. Pergamon/Elsevier Science, Kidlington, pp 127–146 White R, Engelen G (1993) Cellular automata and fractal urban form: a cellular modelling approach to the evolution of urban land-use patterns. Environ Plan 25(8):1175–1199 White R, Engelen G (1997) Cellular automata as the basis of integrated dynamic regional modelling. Environ Plan B Plan Design 24(2):235–246

Spatial Interaction Models: A Broad Historical Perspective Johannes Bröcker

Contents 1 2 3 4 5 6 7

Introduction and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Early Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Social Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maximum Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maximum Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Random Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Trade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 The Specific Goods Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 The Monopolistic Competition Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 The Ricardian Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

100 101 103 106 110 111 114 116 118 119 121 122 122

Abstract

More than a century ago the Austrian railway engineer Eduard Lill stated for the first time an empirical regularity in mathematical form that today is known as the gravity model of spatial interaction. This chapter reviews five, partly overlapping, waves of literature aiming at a theoretical underpinning of the most simple and of more sophisticated forms of the gravity equation: (1) the Social Physics analogy, (2) the maximum entropy approach, (3) the maximum (deterministic) utility model, (4) the random choice model, and (5) the microeconomic general equilibrium model. The chapter shows that, from different angles, isomorphic formal structures have emerged

J. Bröcker (*) Faculty of Business, Economics and Social Sciences, Institute for Environmental, Resource and Regional Economics, CAU Kiel University, Kiel, Germany e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_131

99

100

J. Bröcker

from the different schools of thought, though recent trade research applying gravity models as its work horse is not fully aware of this isomorphism. The chapter furthermore emphasizes that theoretical progress did not only deliver a theoretical underpinning of what empiricists did anyway, but contributed to progress in two more ways: it transformed the simple model treating individual origin-destination flows in isolation to a system model accounting for the general interdependence of flows between all origins and all destinations, and it allowed to extend the field of applications from measurement and analysis to policy evaluation. Keywords

Gravity model · Spatial interaction · Maximum entropy · Random choice · Trade model · Social physics · General equilibrium · Consumer’s surplus

1

Introduction and Notation

What are the main factors influencing the interaction of people across space? How can these factors be put together in a mathematical structure that allows for parametric estimation, quantitative prediction, and application to policy evaluation and decision-making? First rudimentary answers to these questions go back to the end of the nineteenth century, and the literature began to flourish with the systematic presentation of the so-called gravity model around 1930, foreshadowed, though not under this name, by the law of passenger travel formulated by the Austrian railway engineer Eduard Lill (1889a, b). Obviously for one form of spatial interaction, namely, trade, theory is much older, having had its first climax with Ricardo’s theory of comparative advantage. But it focuses on countries, not regions, and the spatial dimension is abstracted from by assumption. To the contrary, the spatial dimension is the main focus of the models we are reviewing in this chapter. Spatial interaction models aim at analyzing observations with multiple spatial references, such as movements of people during a day, starting at home at point r, working at point s, shopping at point t, and returning home to point r. Similarly, during their lives, people are born at point r, educated at point s, get their first job at point t, and so forth. Such chains of interaction refer to an entire sequence of points in space. Models able to cover such chains in a consistent formal structure are however of relatively recent origin and will not be reviewed here any further. We confine on models for dyadic data, referring to just two points in space, called origin and destination of an act of spatial interaction. Aggregates of such acts of spatial interaction are called flows, measured in terms of quantity, value, frequency, intensity, number of persons, and so on. The most prominent type of spatial interaction models for dyadic data is the famous gravity model. It has the longest tradition and still dominates the literature in many fields, though behavioral theories of spatial interaction naturally may lead to mathematical specifications that do not display gravity form. Because of its prominence, our focus is on models with gravity form, and we will only note in passing how certain schools of thought may lead to different forms. A gravity model explains origin-destination flows by three types of explanatory variables, those characterizing origins, those characterizing destinations, and those

Spatial Interaction Models: A Broad Historical Perspective

101

characterizing relations between origin and destination. The respective sets of regions of origin and destination need not be identical, but they often are, and this is what we assume henceforth. Explanatory variables referring to origins and destinations are usually called “masses,” in analogy to terms from physics to be discussed below. Masses are indicators supposed to foster flows such as population and income. Determinants relating to the relation between origin and destination are geographical distances, interaction costs, or any kind of barriers between origins and destinations. We call them distances for short, even though variables promoting rather than hindering interaction such as trade agreements, language similarities, or cultural closeness may also be among the relational variables (inverse distances, so to speak). In most cases, the distance between origin r and destination s will be denoted by a single variable crs in the following, which is, however, thought to be a composite of the above-mentioned bundle of relational variables. Units of measurement may be money per unit of good or time per trip. In the trade model in Sect. 7 below, it is understood as a dimension-free share of the traded goods that is assumed to be used up for the transportation service (so-called iceberg costs, see below). We say that a model for origin-destination flows xrs has gravity form, if flows can be factored in the form xrs ¼ ar bs f ðcrs Þ,

ð1Þ

where in general both, ar and bs may depend on masses in all regions as well as on all distances, while f(crs), called the “resistance” between r and s, depends only on the distance between r and s. This restriction implies that cross product ratios only depend on the respective distances, f ðcrs Þf ðcr0 s0 Þ xrs xr0 s0 ¼ : xrs0 xr0 s f ðcrs0 Þf ðcr0 s Þ The traditional (old) gravity model has a more restrictive form, assuming ar to depend only on masses in r and bs to depend only on masses in s. We call this the independent flows model. Gravity models are applicable and have been applied to practically any type of flows in a society, migration, commuting, traffic and transport, trade, investment, payments, phone calls, internet clicks, citations, likes, followers, marriages, and probably more. Applications of the gravity model are so numerous that reviewing them is impossible. Searching for gravity models in Google Scholar shows more than three million entries. Even if entries referring to models from physics are subtracted, the number is still huge.

2

Early Ideas

In an early historical survey, Carrothers (1956) suggested Reilly’s “law of retail gravitation” (Reilly 1931) to be the first quantitative statement of a gravity model as defined here. His law states that retailers in a city s attract demand from outside that

102

J. Bröcker

is increasing in the population Ms of the city (its “mass”) and decreasing in the distance between the city and the customer. Choosing a power functional form, he defines a breakpoint r on the way between two cities s and s' offering the respective retail service, where the volumes xrs and xrs0 of the respective businesses attracted by the two cities are just the same, i.e., xrs = xrs0 implying ðMs =Ms0 Þβ ¼ ðcrs =crs0 Þγ : Evaluating an extensive amount of retail trade data, he suggests β to be around one and γ to be around two. Note that this is tantamount as claiming that xrs has the form xrs ¼ ar Ms =c2rs ,

ð2Þ

where ar is an as yet undetermined origin factor. Clearly, this is a special case of the gravity form (1). It is worth mentioning that although the law is named the law of retail gravitation, it is purely regarded as an empirical regularity. No reference is made in Reilly’s book to Newtonian physics as a possible theoretical justification. “A law is not a theory.” (Reilly 1931, p. 32) Just as Newton and Galilei – according to Reilly – identified laws of nature by measurement, he (Reilly) identified his law by measurement. Concluding β to be equal to one and γ equal to two (by and large) is not derived from a belief that flows obey laws of nature directly applied to the social world. The law, “. . . based as it is on the actual measurement of existing conditions, is not a matter of theory or opinion but a law in the true sense of the word” (Reilly 1931, p. 33). It is remarkable that he mentions theory in the same breath as opinion, putting it into the contrary position to a law derived from measurement. There is not the slightest concern about the general validity of conclusions from measurement without theory in social sciences. Unknown to Carrothers (1956) and the English language literature, a similar equation as Eq. (2) was published four decades earlier by Eduard Lill, an Austrian railway official (Lill 1889a, b). Lill postulated the “law of travel” (Reisegesetz) stating that for any individual the frequency of trips times the distance per trip is a constant. The expected number of travelers starting by rail from a big center (Vienna) with population M and aiming at traveling a distance c is thus x = δM/c with a constant δ depending on travel preferences and the economic importance of the center. Note the difference to formula (2). The willingness to travel along a further distance declines with inverse distance, not inverse distance squared. If people entering the train in Vienna could leave the running train at any point they wanted, then the number of travelers leaving the train along the small (infinitesimal) segment from c to c + dc would amount to dx ¼ δM =c  δM =ðc þ dcÞ ¼ dcδM =c2 : Travelers can in fact leave the train only at certain stations patronizing certain segments of the line. Let travelers enter the train at the center r and consider a station s at a distance crs from r patronizing a line segment of length Ls, the so-called

Spatial Interaction Models: A Broad Historical Perspective

103

“order district” (Bestellbezirk), then the number of travelers entering at r and leaving at s is xrs  δr Mr Ls =c2rs : Lill also allows for forces of attraction due to different political, industrial, or touristic importance of destinations, represented by an additional weight ωs of the destination. This leads to his final conclusion xrs  δr ωs Mr Ls =c2rs , again a clean version of a gravity formula, but again without any reference to Newtonian physics, even if the notion of a “force of attraction” has at least a certain social physics flavor.

3

Social Physics

After a series of publications collecting evidence showing flows to be increasing in the sizes of origins and destinations and decreasing in distance until the 1940s (see Carrothers (1956) and Niedercorn and Bechdolt (1969) for references), the Princeton astrophysicist John Q. Stewart published a series of papers pointing out a strict analogy between Newton’s laws of gravitation and interregional interaction in human societies (Stewart 1941, 1947, 1948). The idea of analogies is much older. Carrothers (1956) cites a remark of the American sociologist and economist Carey (1858): “Man tends of necessity to gravitate to his fellow-man.. . . Gravitation is here, as everywhere else in the material world, in the direct ratio of the mass, and in the inverse one of the distance” (Carey 1858, p. 43). For two demographic masses Mr and Ms located at a distance crs from one another, Stewart (1948) proposes three indicators, the demographic force Frs ¼ GM r Ms =c2rs , the demographic energy Ers ¼ GM r Ms =crs , and the demographic potential exerted on a mass at r by a mass Ms at distance crs V r ¼ GM s =crs : The definition of the force corresponds to the Newtonian definition of the amount of force (the force is a vector in space) with gravitation constant G, while the formulae for energy and potential correspond to the ones from physics, but with signs reversed. Potential energy in physics is

104

J. Bröcker

ð crs 1

GM r Ms =c2 dc ¼ Ers :

Potential energy increases with increasing distance, and the zero level is at infinity, by convention. It is thus negative for any finite distance between two bodies. To the contrary, demographic energy decreases with increasing distance and vanishes if the two masses get infinitely wide apart. Demographic potential in r is just demographic energy per unit of mass. In a world with many regions, it is the sum Vr ¼ G

X Ms =crs :

ð3Þ

s

Note that, due to integration, demographic energy and potential are decreasing in inverse distance, while force is decreasing in inverse distance squared. Stewart correlates (by visual inspection) his demographic potential with a series of economic and socio demographic indicators finding, among other things, wealth to be larger in high potential regions, though the relation is less tight in some parts of the USA than in others. We mention, as an historical curiosity, his pejorative correction to weight black people lower than an average American! Interpreting his demographic potential as an indicator measuring the access to markets and hence to acquire wealth well makes sense on the background of modern microeconomic theory of the consumer. With the number of identical individuals Mr we can interpret d rs ðcrs Þ ¼ xrs =Mr ¼ GM s =c2rs as individual demand for a trip from r to s with price crs. Then the demographic potential Vr in Eq. (3) is the integral of demand with respect to the price from crs to infinity, just the standard consumer’s surplus. When there are many destinations, the consumer’s surplus is Vr as in the demographic potential formula (3). Demands for trips to different destinations are assumed to be independent from one another. This corresponds to the basic assumption of the independent flows model. It will be different, when it comes to interdependent flows in structural gravity models to be dealt with below. Still, Vr is a function of all prices cr1, . . ., crn in the multidestinations case, and Vr is a convex function of prices with partial derivatives @V r ¼ drs ðcrs Þ: @crs We call this Hotelling’s derivative property. Mind the minus sign! It has to hold generally for a consumer’s a surplus concept that allows for a strict derivation from utility theory, and it does hold for the consumer surplus measures in structural gravity models to be introduced later. Finally, summing over consumers’ surpluses yields demographic energy as the total consumers’ surplus in region r. The demographic potential (or consumer’s surplus) extends naturally to demand with γ 6¼ 2 and γ > 0,

Spatial Interaction Models: A Broad Historical Perspective

105

8
1 indicating that the product price in the destination region exceeds the mill price by 100 (Crs – 1) percent. The iceberg assumption makes sure that these costs are fixed percentages of the product value for each origin destination pair, supposing that the resource used for the transport service is the transported good itself: it “melts” during transport. What arrives at the destination is less than what is sent off, but the price is higher at the destination than at the origin, such that the traded values at the origin and destination coincide. The first publications on the gravity

116

J. Bröcker

trade model (Anderson 1979; Bergstrand 1985) do not yet make this assumption, but are silent as to where the resource comes from. This generates certain consistency problems in a general equilibrium, that we generously disregard here to avoid overloading the exposition with technical clutter. Let crs denote distance in the wide sense introduced before, measured in units of the transported good itself. Then the iceberg mark-up factor is Crs ¼ 1 þ crs  exp ðcrs Þ:

ð16Þ

Remember that we understand the term “distance” as a composite of a bundle of trade impeding or trade promoting variables, among them transport costs increasing in geographical distance. Surprisingly, most modern applications disregard the constraint Crs > 1, assuming Crs to be a power function of geographical distance, inter alia. The careful empirical application in Kortum’s and Eaton’s (2002) paper is a notable exception. With the specification (16), the conditional trade model is the same as the maximum entropy form (6) with constraints (15), θ = γ, and upper case X for value flows replacing lower case x for quantity flows. It has to be emphasized that Eq. (14) with constraints (15) is the solution conditional on incomes Xr given for all r. In general, incomes respond in a counterfactual analysis in a way only visible in the full structural models to be developed next.

7.1

The Specific Goods Model

This model assumes that commodities are distinguished by their respective place of origin, and that the range of commodities in each respective region is exogenous (so-called Armington assumption, after Armington (1969)). The first version is Anderson (1979), assuming Cobb-Douglas demand. The outcome is Eq. (14) with γ = 0, very simple, indeed. In this case Eq. (14) is not only true conditional on given incomes, but the incomes are in fact constant, namely, Xr = ωrX•, where ωr > 0, rωr = 1, is the share parameter in the Cobb-Douglas preferences. World income can be chosen to be constant by the arbitrary choice of the numéraire. The independence of flows from distance is one of the strange implications, drastically at odds with the evidence. Another one is that no data variations, neither changes of the resources Rr nor changes of distances have an impact on nominal incomes. If a resource stock goes up – e.g., by a positive productivity shock – the factor price goes down to such an extent that the nominal income remains unchanged. Real incomes do not remain constant, however. If Rr goes up by 10%, say, real incomes go up by ωr times 10% everywhere. In contrast to these outcomes, one would expect that a positive technology shock raises the nominal income in the respective region, that other regions gain more by the shock if they are closer by, and that trade values are decreasing in distance. To achieve that, the elasticity of substitution has to be larger than in the Cobb-Douglas case, where it is unity. Anderson suggests this extension in an appendix. It is fully worked out and extended further a few years later by Bergstrand (1985).

Spatial Interaction Models: A Broad Historical Perspective

117

Bergstrand introduces a couple of model extensions. He allows for imbalanced trade, for nontradables, and for different elasticities of substitution between domestic and imported goods on the one hand end between imported goods from different sources on the other hand. Finally, he introduces an entirely new idea, a finite elasticity of transformation on the supply side. For the sake of tractability, we abstract from these extensions except the last one. It means that producers shift resources from serving one destination to serving a different one only if a change in relative mill prices achievable on the different markets incentivizes them to do so. This differs from the standard assumption that the producers are willing to serve any market at the same mill price, where ever the demand comes from. With a constant elasticity of transformation (CET) technology on the supply side, the equilibrium condition for any pair of regions becomes  ηþ1  1σ Prs Prs Crs Xrs ¼ Rr W r ¼ ωr Rs W s , ð17Þ Wr Ps with resource price Wr and mill price Prs for goods from r delivered to s. We denote prices in upper case now to distinguish them from prices in the models of former sections. We show below how they are related to each other. The income is RrWr. The customer’s price at the destination including transport cost is PrsCrs. Remember that the iceberg assumption requires the values of flows at the origin and destination to coincide. The transformation technology is intuitive: a relative increase of the mill price achieved in a destination by 1% lets the sales’ share of the respective region increase by η + 1 percent. The parameter η is the elasticity of transformation. The parameter ωr denotes the share parameter of the CES function. Though missing in Bergstrand’s original paper, we introduce it for comparability with the other models. In fact it only affects absolute prices but neither flows nor percentage price changes in a counterfactual experiment. The factor price Wr is the maximal revenue per unit of the resource in r defined as W 1þη ¼ r

X s

P1þη rs

ð18Þ

such that the origin constraint sXrs = RrWr is fulfilled. Hotelling’s derivative property says that the partial elasticity of Wr with respect to Prs is the share of region s in the sales of region r. Similarly, the composite destination price (also called the true cost of living index) is the minimal expenditure per unit of utility in s defined as a generalized average 1σ

Ps

¼

X ωr ðPrs Crs Þ1σ ,

ð19Þ

r

such that the destination constraint rXrs = RsWs is fulfilled. Hotelling’s derivative property says that the partial elasticity of Ps with respect to Prs is the share of region r in the procurements of region s. These results on CES demand and CET supply are standard in microeconomics. For proofs we have to refer to microeconomic textbooks.

118

J. Bröcker

To solve for equilibrium prices and flows, one can solve Eq. (17) for the prices Prs and insert them into Eqs. (18) and (19) to obtain a nonlinear system of 2n equations in the 2n unknowns Wr and Ps. To be precise, one of these equations is redundant and only relative prices are determined. An overall price level can be fixed arbitrarily (Walras’s Law). Conditional on given incomes the solution is Eq. (14) with constraints (15) and γ = (σ  1)(η + 1)/(η + σ). Bergstrand derives the slightly different exponent σ(η + 1)/(η + σ). The reason is that he calculates flows in mill prices without the iceberg assumption. Modern applications of the specific goods model typically refer to Anderson and van Wincoop (2003) published almost two decades later. But it is just the special case of Bergstrand’s model with perfect transformability on the supply side, i.e., the limiting case with η tending to infinity, implying γ = σ  1. Mill prices Pr that are uniform across destination then replace destination specific prices Prs. Choosing units such that one unit of resources produces one unit of output they can be identified with the resource prices, Pr = Wr. Equation (17) then condenses to  Xrs ¼ ωr

Pr Crs Ps

1σ

R s Ps ,

and Pσr

ω X ¼ r Rr s

!  1σ Crs Rs Ps Ps

ð20Þ

replaces Eq. (18). It follows from the supply constraint sXrs = RrPr. Equation (19) remains unchanged. Equation (20) shows that also in this model version the resource price Pr goes down if the resource stock expands, but less than in the Cobb-Douglas case, such that the nominal income goes up. The respective elasticity is 1 – 1/σ. For σ = 5, which is at the lower end of what is typically assumed in empirical applications, a 1% increase of Rr leads to an income increase by 0.8% (neglecting price responses in other regions).

7.2

The Monopolistic Competition Model

From here it is only a small step to Krugman’s (1979) monopolistic competition model suggested in a seminal paper published in the same year as Anderson’s CobbDouglas version of the specific goods model. His and Helpman’s textbook (Helpman and Krugman 1985) puts it into the perspective of and merges it with the entire universe of trade theories of the time, and the model has become the core of the New Economic Geography that Krugman eventually was honored for by the Nobel Prize. In formal terms, the difference between his model and the specific goods model (in the perfect transformation version) looks tiny: the share parameter ωr is replaced with the number of varieties produced in r, which turns out to be proportional to the

Spatial Interaction Models: A Broad Historical Perspective

119

resource stock Rr (and can in fact be identified with the resource stock, when units are chosen appropriately). This has neither any consequence for the conditional trade equation nor for the solution of the full structural model, as long as resource stocks remain constant. But it does have consequences for the prices and the conclusions from counterfactual experiments, when resource stocks change. In the monopolistic competition model, firms are assumed to have market power allowing them to choose a supply price exceeding the marginal cost. The CES form of preferences, combined with the iceberg version of transport cost, has a very special implication of the mark-up factor: it is the constant σ/(σ  1) > 1. Firms, producing one variety each, are assumed to use the resource for a certain amount of a fixed input for setting up a firm and for constant marginal input. Firms enter or exit the market until profits are driven to zero. The assumptions imply that output per variety is a constant and the number of varieties is thus proportional to the resource stock. Output varies by entry or exit of firms (extensive margin), not by variations of output per firm (intensive margin). When the stock of resources in a small region increases, such that price responses elsewhere are negligible, then the price does not decline at all. This can be seen in Eq. (20), where the fraction ωr/Rr is replaced with one, because Rr is substituted for ωr. The expanding supply generates its own expanding demand by extending the range of product diversity. Income expands by the same proportion as the resource stock (neglecting price responses elsewhere). Furthermore, on the demand side an expansion of supply in a region of origin (i.e., an increase of Rr = ωr in Eq. (19)) reduces the cost of living index Ps in s. This is the forward linkage effect due to preference for diversity built into the model by the CES function with variables Rr replacing fixed parameters ωr.

7.3

The Ricardian Model

The third gravity trade model to be reviewed here is the Ricardian model introduced by Kortum and Eaton (2002). The authors return to the assumptions of perfect competition and an exogenous range of goods. Different from the specific goods model they however allow each good to be produced in any region, which is more plausible than the somewhat artificial assumption that consumers distinguish goods not by their physical properties but by the place of origin, which they most often not even know! It would be equally implausible to assume that while consumers in fact only care about physical properties of goods, each physically distinguishable good can be produced in only one place of the world. Most goods can be produced in most places of the world (at least in principle), but productivities differ. Not only general levels of productivity differ, but also productivity ratios between goods, giving rise to specialization according to the principle of comparative advantage. Consumers choose the cheapest source for each good they consume, including transport cost of the iceberg type. As it is impossible to know the details about productivities of everything everywhere, the authors assume productivities to be random variables drawn from a

120

J. Bröcker

Frechét distribution. A random variable is Frechét distributed if and only if its log is Gumbel-distributed. We thus prefer explaining the theory referring to Gumbeldistributed logs of productivities rather than Frechét-distributed productivities for two reasons: first, because we can draw on formal results that we are already familiar with from the random choice model, and second to show that the logit random choice model and the Kortum and Eaton model turn out to be isomorphic, after certain variables’ transformations. Let the resource input per unit of a commodity produced in region r be exp(ϵ μr) with a random component ϵ  G ðθÞ and a nonstochastic shifter μr. The higher is μ, the more productive is a region, on average. A customer in s chooses a producer in an origin r minimizing the inclusive price, that means maximizing urs ¼ ϵ þ vrs with vrs = μr  log Pr  crs and ϵ  G ðθÞ. We have seen already in Sect. 6 that the maximal urs attained in s is the random variable max urs ¼ ϵ þ vs r

with exp ðθvs Þ ¼

X r

ð21Þ

exp ðθvrs Þ:

The probability of choosing region r as a supply region for buying goods in s is P rs ¼ ωs exp ðθðps  pr  crs ÞÞ, with exp ðθps Þ ¼

X r

ωr exp ðθðpr þ crs ÞÞ,

ð22Þ

pr = log Pr and ωs = exp(θμs) (see Eqs. (12) and (13)). We also repeat that maxr urs as given in Eq. (21) is also the maximum of urs conditional on region r being the best choice. This implies that the expected expenditure per unit of utility conditional on region r being the best choice is the same for all regions of origin (provided the expectation exists). Therefore, P rs is not only the probability of r being the best choice, but also the expenditure share. We hence finally obtain the trade flow Xrs ¼ ωr exp ðθðps  pr  crs ÞÞXs  θ Pr Crs ¼ ωr Ps Rs Ps with constraints sXrs = RrPr and rXrs = RsPs. Conditional on given incomes, the solution is Eq. (14) with Eq. (15) and γ = θ. The solution of the full structural model

Spatial Interaction Models: A Broad Historical Perspective

121

is a nonlinear system of 2n equations (the constraints) in 2n unknowns (one of them redundant), as before. The model is identical with the specific goods model with perfect tranformability, but the interpretation of γ is completely different. It is a characteristic of consumer preferences in the latter (γ = σ  1) but a characteristic of productivity differentials (γ = θ) in the Ricardian model. The expenditure shares in the Ricardian model are just the choice probabilities in the logit model, where lower case prices and distances p and c are logs of upper case prices and distances P and C. The true cost of living index in destination s is Ps ¼ exp ðps Þ. The Hotelling derivative property requires the partial elasticity of Ps with respect to Pr to be the expenditure share. This partial elasticity is the partial derivative of ps with respect to pr, which is P rs , as one can easily check using Eq. (22). We have thus shown the perfect isomorphism between the logit model of random choice and the Ricardian gravity trade model.

8

Conclusion

Looking back, the similarity between the first known gravity formula in a passenger travel analysis of an Austrian railway engineer in the nineteenth century and gravity models in most recent applications is striking. While this is a convincing proof of the model’s strength, one might also argue that it indicates little scientific progress in understanding spatial interaction in social sciences: the practical empirical research has stuck to the same tools; theoretical innovations just delivered new soundtracks to the same movie. But beyond the fact that a new soundtrack to an old movie may be an exhilarating experience in itself, the movie has also changed in the light of new theories. The most fundamental innovation in terms of econometric applications was the introduction of origin and destination constraints and of the balancing factors connected to them. An example of missing communication between social science disciplines is the fact that it took three decades until mainstream economics learned about the necessity to care about these factors – baptizing them “multilateral resistance terms” – when estimating resistance functions by gravity models. This is all the more notable as gravity trade estimates washing out the balancing factors as fixed effects appeared already a decade earlier in the regional science literature (see Bröcker and Rohweder 1990, and references there to German language publications in the early eighties). And more than that has been gained by theoretical progress. First, the interpretation of the terms in the model is now much more lucid, mainly due to clarifying the relations between balancing factors, prices, and consumers’ surplus. Second, setting up structural forms now teaches us how to do comparative static analysis in counterfactual experiments. Reduced conditional forms are silent in this regard. Third, the theory allows for drawing welfare conclusions in such experiments. And fourth, setting up a gravity model by restricting the theory to special functional forms of preferences, technologies, or random distributions opens the way to generalizations by using different functional forms and distributions, introducing heterogeneous agents and so forth. So far, the trade literature focused on

122

J. Bröcker

heterogeneous firms, while the random choice literature focused on heterogeneous consumers. Heterogeneity of both is one of the many perspectives for model generalization to be yet developed.

9

Cross-References

▶ Factor Mobility and Migration Models ▶ Interregional Input-Output Models ▶ Interregional Trade: Models and Analyses ▶ Land-Use Transport Interaction Models ▶ Network Equilibrium Models for Urban Transport ▶ Spatial Econometric OD-Flow Models ▶ Travel Behavior and Travel Demand

References Alonso W (1978) A theory of movements. In: Hansen NM (ed) Human settlement systems: international perspectives on structure, change and public policy. Ballinger, Cambridge, MA, pp 197–212 Anderson JE (1979) A theoretical foundation for the gravity equation. Am Econ Rev 69(1):106–116 Anderson J, van Wincoop E (2003) Gravity with gravitas: a solution to the border puzzle. Am Econ Rev 93:170–192 Armington PS (1969) A theory of demand for products distinguished by place of production. Int Monet Fund Staff Pap 16(1):159–178 Batten DF, Boyce DE (1986) Spatial interaction, transportation, and interregional commodity flow models. In: Handbook of regional and urban economics, vol 1. North Holland, Amsterdam, pp 357–401 Bergstrand JH (1985) The gravity equation in international trade: some microeconomic foundations and empirical evidence. Rev Econ Stat 67(3):474–481 Bergstrand JH, Egger P (2013) Gravity equations and economic frictions in the world economy. In: Bernhofen D, Falvey R, Greenaway D, Kreickemeier U (eds) Palgrave handbook of international trade. Palgrave Macmillan, London, pp 532–570 Boyce D (2014) Network equilibrium models for urban transport. In: Fischer MM, Nijkamp P (eds) Handbook of regional science. Springer, Berlin, pp 759–786 Bröcker J, Rohweder HC (1990) Barriers to international trade: methods of measurement and empirical evidence. Ann Regional Sci 24(4):289–305 Bröcker J, Dohse D, Rietveld P (2019) Infrastructure and regional development. In: Capello R, Nijkamp P (eds) Handbook of regional growth and development theories: revised and extended second edition. Edward Elgar, Cheltenham, Chap. 9 Carey HC (1858) Social science. J. P. Lippincott, Philadelphia Carrothers GAP (1956) An historical bedew of the gravity and potential concepts of human interaction. J Am Inst Plann 22(2):94–102 Debreu G (1960) Review of R. Luce, individual choice behavior. Am Econ Rev 50(1):186–188 Domencich TA, McFadden D (1975) Urban travel demand: a behavioral analysis. North-Holland, Oxford Erlander S (1980) Optimal spatial interaction and the gravity model. Lecture notes in economics and mathematical systems: operations research, computer science, social science. Springer, Berlin

Spatial Interaction Models: A Broad Historical Perspective

123

Evans S (1973) A relationship between the gravity model for trip distribution and the transportation problem in linear programming. Transp Res 7:39–61 Helpman E, Krugman PR (1985) Market structure and foreign trade: increasing returns, imperfect competition, and the international economy. MIT-Press, Cambridge, MA Isard W (1960) Methods of regional analysis: an introduction to regional science. MIT Press, Cambridge, MA Kortum J, Eaton S (2002) Technology, geography, and trade. Econometrica 70(5):1741–1779 Krugman P (1979) Increasing returns, monopolisitc competition, and international trade. J Int Econ 9:468–480 Lill E (1889a) Die Grundgesetze des Personenverkehrs. Zeitschrift fur Eisenbahnen und Dampfschiffahrt der Österreich-Ungarischen Monarchie II 35:698–705 Lill E (1889b) Die Grundgesetze des Personenverkehrs (Fortsetzung). Zeitschrift für Eisenbahnen und Dampfschiffahrt der Österreich-Ungarischen Monarchie II 36:713–725 Luce RD (1959) Individual choice behavior: a theoretical analysis. Wiley, New York Niedercorn JH, Bechdolt BV (1969) An economic derivation of the “gravity law” of spatial interaction. J Regional Sci 9(2):273–282 Reilly WJ (1931) The law of retail gravitation. William J. Reilly, New York Stewart JQ (1941) An inverse distance variation for certain social influences. Science 93(2404):89–90 Stewart JQ (1947) Suggested principles of “social physics”. Science 106(2748):179–180 Stewart JQ (1948) Demographic gravitation: evidence and applications. Sociometry 11(1/2):31–58 Tinbergen J (1962) Shaping the world economy: suggestions for an international economic policy. Periodicals Service Co., New York Wilson AG (1967) A statistical theory of spatial distribution models. Transp Res 1(3):253–269 Wilson AG (1971) A family of spatial interaction models, and associated developments. Environ Plan 3(1):1–32

Regional Science Methods in Business Graham Clarke

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Macroscale Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Mesoscale Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Microscale Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

126 127 130 134 138 139 139

Abstract

This chapter examines the links between regional science and its techniques and the business and commercial environment. Firstly, the focus is on the early years of the discipline and some important links with the private sector seen through the historical lens provided by Isard (History of Regional Science and the Regional Science Association International, Springer, Berlin, 2003). Then the chapter focuses on applications of macro models, in particular commercial applications of input-output modeling. This is followed by a focus on mesoscale models. The history of gravity and spatial interaction models in business is explored with a focus on applications in motor vehicle retailing and financial services. Lastly commercial applications of spatial microsimulation are highlighted, with applications given from work undertaken in the UK, Ireland, and Australia. Finally the benefits of applied commercial work are briefly reviewed in the conclusion.

G. Clarke (*) School of Geography, University of Leeds, Leeds, UK e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_132

125

126

G. Clarke

Keywords

Regional science methods · Business and commerce · Input-output models · Spatial interaction models · Spatial microsimulation

1

Introduction

A review of the history of the Regional Science Association International (as depicted in Isard 2003) shows that not all participants of the early conferences in the USA were based at academic institutions, with a consistent sprinkling of practitioners present throughout those early days. One noticeable community came from the regional federal reserve banks of the USA. Although these were not strictly private companies, they certainly competed with the private banks in the marketplace and were run very much on commercial lines. Not surprisingly, these federal reserve banks were interested in regional fiscal issues, and there were a number of papers presented by employees of these companies around regional financial resources, regional credit expansion, inter-regional flows of funds, and income estimation. [One such employee of these banks was Charles Leven who having later joined academia became a stalwart member of the regional science academic community, working on economic base theory and regional growth.] Alongside these bank employees, the early conferences saw presentations by speakers from a number of private sector organizations, including Standard Oil Co. of New Jersey (a forerunner of Exxon), Hood Rubber Company, Bell Telephone Laboratories, and Vitro Corporation. Also, as the RAND Corporation developed models in the social sciences (originally a nonprofit organization based on aerospace research), their employees appear in the conference programs from time to time. In terms of business consultancy, it is interesting to observe that in those early days there were also a number of participants from the business commercial sector (Isard 2003). The most notable was Howard Green. His first appearance is seen at the “Regional Research Meeting” in Washington, DC, in 1952. At that time he was the head of location planning at Stop and Shop, a major US grocery firm operating mainly in northeast USA. Not surprisingly, he appears in the occasional sessions on retail geography held at various conferences. Those sessions included a number of papers around the theme of gravity models (described by Isard as being very strange to economists – almost quackery!). Indeed in 1954 Green presents gravity models for the production of supermarket catchment areas for a “real organization” at the American Association of Geographers meeting in Philadelphia. One can only speculate that the analysis was for Stop and Shop! In 1961, now working for the US retailer Montgomery Ward, he describes his role in the store research and development department and the national growth program he helped to implement using what we would now term routine geodemographic analysis of populations and current store locations (Green 1961). He founded his own consultancy company (Howard L Green and Associates) in 1965 and was a leading exponent in applying quantitative methods in marketing. His programs were later sold to Gravitec who

Regional Science Methods in Business

127

produced a software package called Siteplus, a gravity model-based system still used by many leading US retailers today (Beitz 2018). In 1979 he wrote an important paper with David Rogers (then at Retail Research, Inc. but who later went on to found a research consultancy, DRS Marketing Systems) again using data obtained from a major grocery firm (Rogers and Green 1979: the client in this case is described as a “regional supermarket chain”). The company and revenues, etc. remained confidential, but the paper showed the importance of using academic techniques, this time based on analogue modeling and spatial statistics in retail location planning. In the 1960s another interesting character was Jack Ransome who published a similar paper to Green in the journal Economic Geography, this time discussing the location strategies he was involved with in helping to grow the Kroger brand in the USA (Ransome 1961). As regional science has grown and blossomed since the 1950s and 1960s, it is undoubtedly true that the discipline is now much stronger in relation to applied work. In the last 10–20 years, the regional science journals have many more examples of work undertaken in collaboration with national and local governments and in many examples across Europe with various departments of the EU/European Commission. This growth in applied work was recognized by the RSAI Council in 2008 with the launch of the new journal Regional Science Policy and Practice to help encourage and promote applied work more widely. Peter Batey (see Chapter ▶ “Regional Science Methods in Planning”, in this handbook) presents a review of the link between applied regional science and planning in the broadest sense. The aim of this chapter is to consider the relationship between regional science and the business community, especially private sector companies, building on the short introduction above. This is actually a very difficult task! The biggest problem is that so much of the work undertaken for private companies often comes under the label of “consultancy” and is thus usually considered private and confidential (especially in relation to the data sets made available to academics). There are very few case studies therefore in the literature outlining these types of applications, even though one suspects many regional scientists have been involved in such work. That said, it is also a fact that some academics have never wanted to engage with this sort of research, often labeling such research as “gray” (the color of the suits needed to be worn for business meetings and also the nature of the work undertaken: routine applications of well-known models/methods). Bell (2007) presents this view forcefully, albeit in the context of critical human geography. That said, there are a number of examples in the public domain, and this chapter aims to show that there are some applications we can draw upon. The chapter is structured around a selection of regional science methodologies commonly applied at different spatial scales (macro, meso, and micro).

2

Macroscale Models

Many regional scientists have a background in economics and draw upon spatial econometric modeling techniques in their research. The most common macro models seen in regional science are arguably input-output and related models such

128

G. Clarke

as CGE (computable general equilibrium) models. These models are well-known techniques for looking at the economic and social impacts of major investments. For every job created in a new investment scheme (a factory, university, hospital, transport hub, etc.), they can estimate the number of jobs directly generated at construction and operational periods in that industrial sector, as well as jobs generated in other linked industrial sectors – the so-called multiplier effects. The analysis can be disaggregated to the regional level providing regional industrial and employment data sets are available. They are much less common below the regional level (though see Hewings et al. 2001 for a nice example of an intra-urban model). Again the literature on applied models is plentiful, especially for governments keen to evaluate the regional impacts of their investments in major projects. After the 1960s and 1970s, some of these models became available as off-the-shelf applications which undoubtedly gave rise to many (often unreported) applications in the business sector. Such models include REMI, RIMS, IMPLAN, and MEPLAN, and comparisons of the structure and outputs of a number of these packages are reviewed in Rickman and Schwer (1995). Undoubtedly major academic centers of input-output modeling around the world have also undertaken many commercial applications. REAL at the University of Illinois, for example, has many (sadly unpublished) applied studies around topics such as the impacts of higher education establishments, residential housing foreclosures, impact of immigration on future economies, tourist impacts, etc. (Hewings 2009 private communication). Similarly the strong US modeling groups at West Virginia and George Mason Universities and European groups at Strathclyde, Groningen, Milan, Amsterdam, and Barcelona have many applied studies on topics such as energy modeling, regional economic development, environmental planning, and natural resource management which can be accessed via their respective web sites. As noted above, the impact of higher education establishments on regional economies has been a common application area in general. Kelly et al. (2004) however provide a good example of a study actually commissioned for a major university – Strathclyde in Glasgow. Universities are funded by both public and private finances but run very much as commercial entities and as such are keen to show their importance in the local economy for their own marketing and public relations purposes. The input-output framework allowed Kelly et al. (2004) to consider the direct expenditure of the university, the indirect expenditure (money spent on purchasing goods and services), and the induced impacts (money spent by university employees). They could do this therefore for the expenditures of the university, its employees, its students, and its visitors. The study showed that a direct expenditure of £165 million by the university in 2002–2003 generated an additional £103.7 million in indirect impacts with around 80% staying in the Scottish economy. The models could also break this down by an industrial sector. The students meanwhile spent an estimated £20 million with multiplier effects taking the total nearer to £26 million, the equivalent of 300 additional full-time jobs. Finally they analyzed the spending of visitors to the university – £6 million in direct spending, leading to a total (after multipliers) of £9 million (120 FTE jobs). Having identified industries prevalent within Glasgow itself, the models were also used to

Regional Science Methods in Business

129

estimate the impact on the Glasgow economy rather than the Scottish economy as a whole. For every £1 million spent by the university, they estimated £0.23 million of output was generated in other industries in Glasgow. As far as the university was concerned, the final message makes for excellent publicity material: the University generated over £300 million of output in Scotland with 68% of this in Glasgow itself making its impact on the economy “substantial” and “increasing Scotland’s absorptive capacity” (Kelly et al. 2004, p63). Another interesting range of applications of input-output for business can be found in the web pages of the Scottish consultancy 4-consulting, led by the economist Richard Marsh, also involved in the Strathclyde case study outlined above. In 2015 the company was asked to undertake a study of the contribution of the Scottish whisky industry to the UK economy (commissioned by the Scottish Whisky Association). The model estimated 40,000 jobs to be connected to the production and distribution of whisky with £1.4bn generated in wages. A similar study was commissioned by the privately owned Gleneagles Hotel in Scotland (a major worldwide destination for golf holidays). The models applied in this case estimated the direct and indirect employment benefits, with a combined total of 2000 jobs strongly connected to the hotel, generating £37 million in household income. They also evaluated the impact of the Scottish racing industry on the Scottish economy for the private firm “Scottish Racing.” Another consultancy based in Scotland is CogentSI, with Hervey Gibson as one of its chief economists. They have worked also with many public and private organizations, using their DREAM model. The DREAM model consists of a set of local input-output tables for 150 or so industries and products that allow the detailed modeling of trade flows and interindustry linkages. The local tables are at NUTS4 (sublocal authority) level in Scotland and at NUTS3 level for the rest of the UK and Ireland. The model describes the production, consumption, and flows between the different industrial sectors and has been used in applications based on the multiplier impacts of the recreational fishing industry, the impacts of tourism in Northern Ireland and the Orkney islands, and the impact of the development of wind farms in Scotland. Another good example of published work undertaken in collaboration with a private organization is a study partly commissioned by British Aerospace (a former state-run organization sold off by the UK government during the 1980s) presented in Batey and Madden (2001). As they explain, the client wanted to analyze the impacts of expanding Liverpool Airport, including the building of new terminals and improving nearby transport accessibility: “of crucial importance to the client’s case was the likely impact on local employment of the development . . . in one of the most economically depressed regions of the UK” (Batey and Madden 2001, p. 38). Clearly, the more positive the benefits, the more likely that planning permission would be approved. The power of the input-output analysis was that various scenarios around airport size and capacity could be tested in terms of various types of economic and social benefits (given this model included more demographic components than is found in most input-output style modeling). The models were based on the Leontief input-output structure, but were extended to include a number

130

G. Clarke

of demographic and labor market variables to enable British Aerospace to assess the impact upon unemployment rates in the areas under study and in neighboring areas such as Manchester (Batey and Madden 2001). As well as the usual interindustry linkage analysis, the study examined the impacts on government revenue and expenditure, a powerful line of argument if the client has to persuade the government to go ahead with a scheme. In this case the model estimated that the expansion predicted by 2005 would result in 19,420 jobs (direct and indirect) of which 13,189 would be taken up by unemployed residents in the catchment area of the airport, including 67% in the Liverpool area itself. Those individuals moving from an unemployed to employed status would produce a £62 million increase in tax revenue for the government while at the same time reducing the benefits bill by £43.97 million (a gross revenue increase therefore of £130 million). It is interesting to note that in 2002, a £42.5 million new terminal was opened at the airport tripling its size and passenger capacity. Another batch of macro models could be argued to be those related to large-scale land-use/transportation models. Although these often have elements of spatial interaction (which allows a disaggregation to regions within cities), they often incorporate input-output models too. Given this is such an important topic in regional science, it is not surprising that there are now both a number of well-known regional scientists who have been engaged with applied land-use/transportation modeling work (e.g., David Boyce, Steve Putman, Alex Anas, John Landis, T.J. Kim, Paul Wadell, David Hensher, and Michael Wegener) and consultancy firms providing software packages for both the public and private sectors (the Martin Centre – MEPLAN, Modelistica – TRANUS, Marchel Echenique & Partners – MEPLAN, David Simmons Consultancy – DELTA, Institute for Spatial Planning Dortmund – IRPUD). Simmonds (2001), for example, provides a nice case study based around different scenarios for using both greenfield and brownfield sites for housing developments in the north of England. This is a good example of a major environmental assessment of land-use transport modeling options for new housing scenarios sponsored by a consortium of public and private agencies.

3

Mesoscale Models

Mesoscale models in regional science tend to have more spatial disaggregation than the macro input-output style models and have generally been more a product of geography than economics. Spatial interaction models, for example, are based on many years of academic model development in geography, growing out of research on the early gravity models mentioned above. The models were given a more robust theoretical foundation after Alan Wilson’s derivation of the model through entropy maximization (Wilson 1970). These models have provided a rich source of theoretical developments in understanding urban spatial structures and have a rich history of applied work in regional science (Roy and Thill 2004; Fischer and Getis 1999). The aggregate model can be written as

Regional Science Methods in Business

131

 m  m m αm Sm ij ¼ Oi Ai W j exp β d ij where: Sm ij = expenditure by household type m in residence zone i at destination j Om i = level of consumer expenditure of household type m in residence zone i Am i = a balancing factor calculated as Am i ¼P

1  m  αm W exp β dij j j

Wj = the attractiveness of destination j αm = a parameter reflecting the perception of a destinations attractiveness by household type m dij = the distance between origin i and destination j βm = the distance decay parameter for household type m The model can be used to estimate any type of flows in a city or region. For example, in a retailing context, Sm ij is the flow of expenditure from residential zone i to shops (or shopping centers) in j; Om i is the total expenditure of residents of i; Wj is the measure of the attractiveness of stores in j, usually measured via floor space as a proxy; and dij is the measure of travel distance or cost between i and j. In the case above, the models are disaggregated by person type m to show the model can routinely handle variations in flow patterns by, say, person age or social class. These models have been adapted for use in a wide range of application areas. They have been used to examine flows of people to shops, offices, work, schools, and hospitals and, in economic geography, flows of goods between regions. Indeed, Fischer and Getis (1999) remark that models of spatial interaction have been fundamental in the history of regional science. They have also been key to work undertaken at the University of Leeds by Alan Wilson and his colleagues. During the 1980s they set out on a mission to see if the private sector could be persuaded of the benefits of using such models in a commercial setting. After a successful project with Toyota GB, others quickly followed. By 1993 they had formed a university-owned company called GMAP (Geographical Modelling and Planning) Ltd. which by the mid-1990s employed over 120 graduates (mainly geographers) and enjoyed a revenue of around $10 million (USA) per year. International clients have included Exxon, Toyota, Ford, and BP, while UK clients have included ASDA, Sainsbury’s, Barclays Bank, Thorn EMI, Whitbread, Storehouse, and WHSmith (among many others). Although there are slight variations in what these individual organizations want from this collaboration, we can generally group these requirements together. Thus for banks, oil companies, car manufacturers, and so on, channel management and network planning are key strategic and operational issues. Once the spatial interaction models have been built and calibrated, they can be used for analysis purposes. A typical product of the flow model for retailing is revenue: the sum of the flows to each center or store (Dj). Given that the client retailer knows their store

132

G. Clarke

revenues and usually has some data on customer home locations (through the growing use of loyalty cards), then the models can be calibrated much more effectively than in the past. This helps to build confidence in the level of accuracy of such models – in most markets the models need to be within 10% of real turnovers (when known), 90% of the time. If this is the case, the company can explore predicted turnover estimated by the model against actual turnover (thus producing a measure of store performance) and use the models to estimate the turnover of all competitor stores (which they are unlikely to know). Another important indicator produced by the models is small-area market shares (the percentage of sales estimated for each firm in each small-area demand zone). Most organizations know their national market share, and many will be able to disaggregate this to the regional level (and there are often market intelligence reports to help here). However, very few know their market shares at the local level (Birkin et al. 2017). Mapping local market shares provides an immediate snapshot of retail performance. It shows areas where the organization has fewer customers. These spatial gaps in the market, often called open points, become possible new store locations for any firm that is contemplating expansion. Thus, having understood the existing performance of individual branches or outlets, the models are then used in “what-if?” fashion. Each one of the variables given in the models can be modified, and the models rerun to estimate the impacts of changes. Take, for example, a major car manufacturer like Toyota or Ford with franchised branches (or dealers) throughout a country. The effects of simple or complex network changes may be evaluated, for example, opening of new dealers, closure of existing dealers, dealer expansion, and dealer relocation. The model operates by simulating how customer patterns change for new network reconfigurations. The impacts upon sales and market share can then be evaluated, and this provides valuable information to the investment appraisal process. The model allows the impacts of various scenarios to be viewed from different angles: by dealer, by geographical area, by brand, and by product line. Toyota used the model to examine every open point in their network! They also undertook one-off exercises such as using the model to launch the Lexus car in the UK – the modeling task was to see how many Lexus the model would estimate could be sold from each dealer (the Lexus competing with top of the range models sold by BMW, Jaguar, Mercedes, etc.). The top 100 locations were then chosen to house the Lexus in the showroom and market it accordingly. Birkin et al. (2017) give a large number of examples of models used in this way in various commercial sectors. A significant advance in spatial interaction modeling (and in regional science generally) was the possibility of modeling the evolution of urban structures. This was first proposed by Harris and Wilson (1978). They argued that spatial interaction models could be set in a dynamic context: in the retail model, if the revenue of a store or center exceeds the costs, then a particular retail store or center is likely to grow. Conversely, if costs exceed the revenue, then the location of the store/center is unstable, and it will decline. This process continues in an iterative manner until the solutions reach equilibrium and stability and no further growth/decline takes place (unless further change is introduced to any of the model’s variables or parameters). In moving from initial to final solutions, there can then be a

Regional Science Methods in Business

133

consideration of the key processes of change which the variables and parameters of the models are trying to capture. This dynamic model has also played a useful role in applied store location research. Clarke et al. (1998) show a real-world application for a major Canadian bank interested, like most retailers, in cutting costs. The problem with closing bank branches as a cost-cutting exercise is that there is a strong correlation between new customer attraction and physical bank locations (despite electronic banking, etc., see Birkin et al. 2017). So, the question is as follows: could a strategy be devised whereby branches may not necessarily be closed, but costs are reduced by removing the number of facilities or products offered at each branch? In other words can we reconfigure the products and services available at each branch according to local consumer demand? The basic objective of exploring the dynamic model therefore was to identify the most unstable components of the delivery network from a consumer usage perspective. The cost/revenue equilibrium mechanism allowed for the identification and assessment of poor-performing outlets by product in respect to stability. Those branches identified as most unstable within the subset of poor performers became the branches where the removal of a specific activity from the delivery outlet was profitable. It was estimated that this plan would save $13.4 million in operating costs (approximately 9% of the total). As seen above, models have typically been used to open or close single stores. A more challenging question to ask is as follows: what would an ideal distribution of branches look like? This is a relevant question to ask when major network restructuring is planned or when new spatial markets are being considered (e.g., a firm expanding into a new country or region). Again techniques developed in regional science become useful here. Theoretical advances in spatial interaction modeling included work on optimization (i.e., Wilson et al. 1981). The commercial environment offered the possibility of applying these new optimization models to a single region or to an entire country. A nice example is taken from Birkin and Culf (2001) and shown again in Birkin et al. (2017), showing the optimization work undertaken for Ford. The first challenge for the optimization model was to show the ideal locations for Ford dealers and compare these to the existing pattern of dealers. The model thus showed which should ideally remain in situ, which should be closed or relocated, and also where new ones should be opened. In the case of Ford in Denmark, for example, around 100 dealers were evaluated in this way. Following optimization and the reconfiguration of the entire network, and keeping a total of 100 dealers, average sales per dealer was estimated to increase from 158 to 202 vehicles, an increase of 27.7%. It was also estimated that the existing number of cars sold from the 100 dealers could be achieved from an optimal network of just 70 dealers. Of course this optimization plan would take years to complete, but it offered Ford a blueprint for long-term growth and change (and was implemented in many countries of the world). Similarly, a different type of optimization model was also applied by Birkin et al. (2002) in the financial services market. This model was applied following the major takeover by Barclays of the Woolwich Building Society in the UK. For every town in the UK, an optimization model was designed to run a variety of strategies involving closing, relocating, and opening new branches (both of

134

G. Clarke

Barclays and Woolwich branches). It was argued that such a model offered a huge improvement in terms of bank accounts retained or obtained anew compared to the likely outcome of the original plan of Barclays to simply close branches which were close to each other. In the Kent region of the UK, the strategy to close overlapping branches would reduce the bank branch numbers from 67 to 57 leading to an estimated 14% reduction in new accounts. However, the optimization model suggested that with closures, relocations, and new branch openings, 57 “optimal” branch locations in Kent (thereby reducing the number of branches as Barclays really wanted to do) could lead to an 8% increase in new accounts generated per annum and an increase of 22 new accounts per branch. It is likely that there are many similar stories of applied commercial modeling around the world. Certainly there are many geographers and planners who have set up consultancies similar to GMAP but perhaps with wider interests in transportation modeling. In the USA, Grant Thrall has done much to put business geography on the map both academically and in consultancy terms (see Thrall 2002), while the Dutch geographers Henk Scholten (Geodan) and Harry Timmermans (Eindhoven University) have done much consultancy work in this area. Also, researchers at Flinders University in Australia in the 1970s built gravity models for the Australian grocery retailer Coles (Robert Stimson 2019, private communication). In addition, there is also a large volume of work in disciplines such as marketing, business management, and retail management (Beitz (2018) provides a good overview from these perspectives).

4

Microscale Models

In regional science there has also been a rich history of research undertaken in microeconomics, especially concerning the economic decisions and actions of individuals and firms. In geography, this type of analysis became common in the 1970s when behavioral geography became popular and more attention was paid to the choice sets available to individuals. Indeed, location theory was often built from the bottom up, with large-scale surveys driving the development of a range of discrete choice models. In addition, since the 1970s, there has been a growing interest in a technique for creating new individual and household data known as spatial microsimulation. Spatial microsimulation involves the estimation of individual or household data using the widest range of available known data to reconstruct detailed microlevel populations. The most important output from the microsimulation process is a data set which gives an estimate of “missing data” – either new combinations of variables not available from the census or new data which is created by the fusion of a number of different data sets. However, once constructed, the models also provide another powerful set of techniques for forecasting and prediction – i.e., what are the impacts of changes in any of the variables estimated in the models? The use of the model will be illustrated below using three case studies. As with the other techniques discussed

Regional Science Methods in Business

135

above, there are many applications of applied spatial microsimulation in the literature, again largely collaborations with government departments. The first example presents a summary of the findings of a research project commissioned by Yorkshire Water in the UK, aimed at investigating the feasibility of producing microlevel population and water demand estimates in order to help inform strategic planning and other management decision-making processes (all the water companies in England and Wales are now privately owned). An important aspect of policy underpinning this research work was that of the impending (enforced) change in the basis of customer billing. Water companies in England and Wales traditionally based the majority of their billing on the rateable value of the properties in their area. This was recognized as an inappropriate methodology following the termination of regular surveys to update rateable values, and water companies now have to plan for an alternative means of billing customers. At the same time, efficient use of water emerged as a major policy issue, with water industry regulators in particular expressing concern over this matter. Further emphasis on efficient use of water resulted from an increased focus on sustainability, an issue that entered the public arena at the same time. One approach to the efficient use of water is through conservation based on pricing as a tool to promote reduced use of water. Demand management had seldom been addressed in this way in the water industry, which to date had typically equated supply with demand. In addition to influencing the efficiency of water use by customers through pricing strategy, efficiency of water use also needed to be tackled through the direct control of leakage, particularly since high amounts of leakage is an issue that has always concerned both the general public and the National Rivers Authority, the environmental regulator. To this end, the identification of leakage and the prioritization of leakage control strategies became an area of great importance to all water plcs. The use of microsimulation techniques in water demand estimation had the potential to aid the identification of high- and low-leakage areas, thus providing a basis for rational economic appraisal of leakage control measures by Yorkshire Water. The methodological approach adopted during the course of the project was to estimate the characteristics of every household in the Yorkshire Water region (based around the major cities of Bradford, Sheffield, Leeds, and York). Six household and individual characteristics derived from the 1991 and 2001 censuses were incorporated into the Leeds microlevel population estimation model. These were household location (ward), household tenure and property types, number of rooms, number of household members, and socioeconomic group of the head of household. Each of the characteristics included were believed to be direct or indirect determinants of household water demand. To calibrate the model, data on household water consumption were supplied by Yorkshire Water in the form of 4039 metered households selected from across Yorkshire. Analysis of this metered data allowed the much more effective quantification of the effect various household attributes have on household water demand. From this analysis it was found that three of the above main household attributes appeared to be most important plus the number of water using appliances, especially presence/absence of a dishwasher and presence/absence of a washing machine within each household. Thus, in addition to generating

136

G. Clarke

household conditions such as tenure and number of occupants for each household in Leeds, it was also necessary to estimate household ownership rates of water using appliances such as washing machines and dishwashers. This information came from the General Household Survey in the UK and was matched to the census data through household tenure. The relationship between these new five household attributes and predicted water demand may be expressed mathematically in the form of a regression model: W ¼ 0:0296 k þ 0:33 OC þ 0:257 DW þ 0:109 WR þ 0:0536 BD  0:0287 HT where: W = water demand k = constant OC = number of household occupants DW = presence/absence of a dishwasher in household WR = presence/absence of a washing machine in household BD = number of bedrooms in household HT = household property type To recap, these estimates have been made using Monte Carlo sampling from the conditional probability of water demand given household property type, number of occupants, number of bedrooms, and presence/absence of a washing machine and dishwasher. The spatial mapping exercise revealed that areas on the eastern and northern fringe of Leeds and in Harrogate and York (generally the highest income areas) had the highest average annual household water demand levels. Inner-city areas were shown as being those where the lowest average levels of household water demand were to be expected. Leakage was estimated by comparing known water supplied at various metered points against household predictions in the areas served by those meters. For more details of this application, see Clarke et al. (1997) and Williamson (2001). The second example is based on work commissioned by Teagasc, the Rural Economy Research Centre of Ireland. Although largely funded by government sources, Teagasc is partly run as a profit-generating organization and works with private sector organizations throughout Ireland (not least advising individuals in the farming community). Since 1999 there has been a wealth of microsimulation applications at Teagasc, based on a large-scale population and household model similar to the one described above. In this case however, important additional household variables relate to the characteristics of the many farms and farmers in Ireland, including their income and farm type (non-census data). Teagasc has been especially interested in exploring changing European Union (EU) legislation for rural policy. The EU Commission and Member States have, over recent years, placed particular emphasis on rural development with special reference to: 1. Enhancing the competitiveness of rural areas, maintaining, and creating employment 2. Reducing socioeconomic disparities between regions, adapting to new market place realities

Regional Science Methods in Business

137

3. Providing appropriate training and converting and reorientating agricultural production potential With increasing recognition that rural development is not synonymous with agricultural development (and with an increasing range and diversity of policy measures), there was a need to develop tools of analysis which would enable the impact of all types of rural development policy to be assessed ex-post and also to enable the potential impact of new policies to be assessed before implementation. Given this background, Teagasc was eager to embark on similar analyses in relation to changes in rural development policy. Accordingly a program of collaboration between the Rural Economy Research Centre and the University of Leeds (and later with the University of Galway) was started with a view to developing a model which would be capable of analyzing the differential impacts of changing rural development. The overall aim of this project was to develop a new framework for the analysis of the Irish rural economy based on spatial microsimulation. The first stage of the project involved the building of a static spatial microsimulation demographic and labor force model for the entire Irish rural economy. The task of this model was to provide reliable estimates of the spatial distribution of the Irish population, first in 1991 but then for 2001 and 2011, the census years. The spatial microsimulation models were then used to identify particular socioeconomic groups that may be affected by EU agricultural policy reforms. Three groups of farmers could be distinguished: 1. Full-time commercial farmers (low-cost efficient production of commodities – requiring scale and specialization – dairy farmers, intensive enterprises, largescale tillage, and livestock). This type of farmers comprised 30% of the total 2. Mixed-income farmers (off-farm jobs) comprising 30% of the total 3. Marginal/transitional farmers (more elderly, relying on social welfare payments for a significant part of their income – farms nonviable, no potential inheritors living on their farms) comprising 40% of the total What was important was to use the microsimulation to ascertain the geographical location of these particular population groups (and this is not publicly available in any one data set). One of the advantages of spatial microsimulation modeling is that it can be used to identify the size and spatial location of particular population groups like the above. The European Commission has calculated that 80% of the support provided by the Common Agricultural Policy (CAP) goes to the largest 20% of farmers. Using a microsimulation model, it is possible to provide estimates of CAP-related support at the small-area level and show first the spatial distribution of full-time farmers who own large farms and are classified as social class 1 or 2. It can be assumed that these farmers would likely have the capacity to have relatively high production and attract large amounts of CAP-related support. The model also estimated the spatial distribution of farmers who were older than 55 years old and had a farm with size less than 30 acres. These farmers would most likely be placed in category 3, the most marginal and vulnerable farmers to CAP reform. The model has

138

G. Clarke

since been used for many other policy applications in Ireland, for example, conservation and rural environmental protection schemes, modeling greenhouse gas emissions from agriculture, the location economics of biomass production for electricity generation, and spatial access to services across Ireland, especially health care. For more examples, see O’Donoghue et al. (2012). The third example is taken from applied work at NATSEM in Canberra, Australia. With an excellent track record of applied spatial microsimulation work to their name, the team was approached by a major commercial bank to understand more about income and debt, especially housing stress which can be strongly correlated with mortgage repayment problems and small-area variations in debt (Rahman and Harding 2014). [There are interesting similarities here to the interest shown by the US federal banks in regional financial issues identified in the introduction to this chapter.] The estimation of housing stress involves building a household model based on age, gender, social class, ethnicity, etc. using core census data (very similar to the Yorkshire and Irish models described above) and then adding more variables relating to wages and income from other household surveys to get a disaggregated household budget estimate. Then this data was further augmented by small-area data on house tenure and house price so that mortgages and rental cost could be compared to each household’s individual budget. Households spending more than an affordable expected proportion of its income on housing could then be labeled as suffering from household stress. The ability to aggregate from household data to small-area data allows the geography of housing stress to be mapped and analyzed. Using this methodology, Rahman and Harding (2014) estimated a housing stress level of 9.9% across all households in Australia, rising to 21.2% in certain Sydney small-area, low-income districts.

5

Conclusions

This selective review has attempted to highlight a number of important applications of regional science techniques within the business and commercial sector. As a discipline, we should be proud that our work can be useful to so many different communities. It is useful to end by speculating on the benefits of commercial work (cf. Birkin et al. 2014). Of course it is hoped that the companies themselves benefit! Some academics have an issue with this – should we be seen to be helping private sector companies make more profitable decisions? It is a hard question of course to answer from an ethical or moral point of view. What might be said in reply is perhaps if we as academics, for example, had helped Thomas Cook in its business development, then perhaps 9000 people in the UK in 2019 might not have lost their jobs! Helping private sector firms can sometimes be the most important research we undertake to stimulate and/or protect regional economies and the people who depend on them. Of course applied consultancy-style work also brings benefits to academics and their universities. Birkin et al. (2014) estimate that between 1985 and 2005 (when it was largely sold off by the University), GMAP brought in £55–60 million to the

Regional Science Methods in Business

139

University of Leeds with a profit in the order of £10 million. In addition to the extra revenue on offer, applied commercial work can greatly enhance teaching materials (real case studies rather than simply numerical experiments) and can skill students in techniques which means they are commercially in demand. But most importantly, perhaps, applied commercial work can lead to important theoretical extensions to models (a view which contradicts the argument in Bell (2007) that much consultancy or commercial work is mundane in nature). This has been especially the case with the spatial interaction modeling applications described in Sect. 3. Birkin et al. (2010) and Newing et al. (2015) show how models have to be disaggregated, extended, and modified in all sorts of ways to fully capture real-world behaviors. Models clearly have to work, and we cannot hide behind too many assumptions! The modelbuilding process is aided by the data sets often at the disposal of the client, data to which academics would normally not get access (a process which should be enhanced by increasing amounts of big data being made available by private sector firms). So commercial work can lead to important theoretical model extensions and thereby to academic journal papers. As noted in the introduction, the examples given here inevitably show just the tip of the iceberg in terms of potential applications that have been undertaken by regional scientists. Apologies to all those whose work with the private sector is unknown to me! Perhaps those with good examples could get in touch, and we could produce a future edited book with more detailed examples, especially across Europe and Asia! That would be a very interesting book to many people!

6

Cross-References

▶ Regional Science Methods in Planning ▶ Spatial Interaction Models: A Broad Historical Perspective ▶ Spatial Microsimulation Acknowledgments The author would like to thank Peter Batey, Eric Bowen, Geoff Hewings, Randy Jackson, Peter McGregor, Cathal O’Donoghue, Bob Stimson, Kim Swales, and Robert Tanton for comments and suggestions on applied regional science. Thanks in particular to Peter Batey for his thoughtful comments and suggestions also on the text itself.

References Batey PWJ, Madden M (2001) Socio-economic impact assessment: meeting client requirements. In: Clarke GP, Madden M (eds) Regional science in business. Springer, Berlin, pp 37–59 Beitz D (2018) Location analysis for business. Business Expert Research, Boston Bell D (2007) Fade to grey: some reflections on policy and mundanity. Environ Plan A 39(3):541–554 Birkin M, Culf R (2001) Optimal distribution strategies. In: Clarke GP, Madden M (eds) Regional science in business. Springer, Berlin, pp 223–241 Birkin M, Clarke GP, Douglas L (2002) Optimising spatial mergers: commercial and regulatory perspectives. Prog Plan 58(4):229–318

140

G. Clarke

Birkin M, Clarke GP, Clarke M (2010) Refining and operationalizing entropy-maximizing models for business applications. Geogr Anal 42(4):422–445 Birkin M, Clarke GP, Clarke M, Wilson AG (2014) The achievements and future potential of applied quantitative geography: a case study. Geogr Pol 87(2):179–202 Birkin M, Clarke GP, Clarke M (2017) Retail location planning in an era of multi-channel growth. Routledge, London Clarke GP, Kashti A, McDonald AT, Williamson P (1997) Estimating small area demand for water: a new methodology. Water Environ J 11(3):186–192 Clarke GP, Langley R, Cardwell W (1998) Empirical applications of dynamic spatial interaction models. Comput Environ Urban Syst 22(2):157–184 Fischer MM, Getis A (1999) New advances in spatial interaction theory. Pap Reg Sci 78(2):117–118 Green HL (1961) Planning a national retail growth program. Econ Geogr 37(1):22–32 Harris B, Wilson AG (1978) Equilibrium values and dynamics of attractiveness terms in production-constrained spatial-interaction models. Environ Plan A 10(4):371–388 Hewings G (2009) Personal communication, 23/08/2019 Hewings G, Okuyama Y, Sonis M (2001) Creating and expanding trade partnerships within the Chicago Metropolitan area: applications using a Miyazawa Accounting System. In: Clarke GP, Madden M (eds) Regional Science in Business, pp. 11–36, Berlin, Springer Isard W (2003) History of regional science and the regional science association international. Springer, Berlin Kelly U, Marsh R, McNicoll I (2004) The impact of higher education institutions on the UK economy. University of Strathclyde, Glasgow Newing A, Clarke GP, Clarke M (2015) Developing and applying a disaggregated retail location model with extended retail demand estimations. Geogr Anal 47(3):219–239 O’Donoghue C, Ballas D, Clarke GP, Hynes S, Morrissey K (eds) (2012) Spatial microsimulation for rural policy analysis. Springer, Berlin Rahman A, Harding A (2014) Spatial analysis of housing stress estimation in Australia with statistical validation. Australas J Reg Stud 20(3):452–486 Ransome JC (1961) The organization of location research in a large supermarket chain. Econ Geogr 37(1):42–47 Rickman DS, Schwer RK (1995) A comparison of the multipliers of IMPLAN, REMI, and RIMS II: benchmarking ready-made models for comparison. Ann Reg Sci 29(4):363–374 Rogers DS, Green HL (1979) A new perspective on forecasting store sales: applying statistical models and techniques in the analog approach. Geogr Rev 69:449–458 Roy JR, Thill JC (2004) Spatial interaction modelling. Pap Reg Sci 83(1):339–361 Simmonds D (2001) The objectives and design of a new land-use modelling package: DELTA. In: Clarke GP, Madden M (eds) Regional science in business. Springer, Berlin, pp 159–188 Stimson R (2019) Personal Communication, 17/07/2019 Thrall GI (2002) Business geography and new real estate market analysis. Oxford University Press, Oxford Williamson P (2001) An applied microsimulation model: exploring alternative domestic water consumption scenarios. In: Clarke GP, Madden M (eds) Regional science in business. Springer, Berlin, pp 243–268 Wilson AG (1970) Entropy in urban and regional modelling. Pion, London Wilson AG, Coelho JD, Macgill SM, Williams HC (1981) Optimisation in locational and transport analysis. Wiley, Chichester

Worlds Lost and Found: Regional Science Contributions in Support of Collective Decision-Making for Collective Action Kieran P. Donaghy

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Ontologies Adequate to the World in Which We Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Ontologizing from Within Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Viewing Actors from the Perspective of Behavioral Economics and Complexity Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Endogenizing Social Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Formal Analysis of Subjunctive Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Subjunctive Reasoning in Policy Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Causal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Example of Causal and Counterfactual Analysis in a Planning Support System . . . 6 Modeling Collective Decision-Making in Design Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

142 145 146 148 149 153 155 156 157 160 161 162 162

Abstract

In this chapter I present a survey of contributions made by regional scientists in recent decades that provide capacity to spatial planners and urban designers to engage in collective decision-making and design leading to collective action by bringing into view relevant aspects of the world – as it is and as it could be. I begin by identifying critical features of analytical frameworks intended to support collective decision-making for collective action. I then offer an argument for why we need a more complete ontology than what has traditionally been invoked in spatial planning and discuss the sophisticated ontology in the planning data model of Lewis Hopkins, Nikhil Kaza, and Varkki George Pallathucheril. Noting that this ontology is noncommittal on underlying social behavior, I further discuss K. P. Donaghy (*) Department of City and Regional Planning, Cornell University, Ithaca, NY, USA e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_135

141

142

K. P. Donaghy

contributions from behavioral economics that might be integrated into Hopkins et al.’s framework and illustrate from the work of Yannis Ioannides how the determination of social structure in space can be endogenized. While acknowledging that the framework of Hopkins et al. supports analyses of transition paths to alternative futures, I argue for formalizing analyses of subjunctive reasoning, drawing on an important but neglected contribution by Walter Isard and Blane Lewis. I then proceed to a discussion of causal analysis in general that draws on the work of Nancy Cartwright and consider an example of how such analysis can be conducted in the planning support system of Brian Deal, Haozhi Pan, Stephanie Timm, and Varkki Pallathucheril. In the chapter’s final substantive section, I briefly consider how Michael Batty is modeling collective decisionmaking in urban design. Keywords

Social ontology · Behavioral economics · Social interactions · Possible worlds · Spatial planning · Collective action

1

Introduction

Since its inception as an applied social science focusing on socioeconomic problems in spatial contexts, regional science has had a close working relationship with regional (or, more generally, spatial) planning in the public domain, which John Friedmann (1988) has characterized as “the management of change in territorially organized systems.” Scholars who identify as regional scientists have contributed much to the contemporary spatial planner’s tool box. Their contributions include analytical frameworks for studying population growth and migration, measuring regional income and constructing social accounts, conducting economic impact analyses, characterizing flows of commodities and money, tracking regional business cycles, explaining industrial location and agglomeration economies, identifying potential industrial complexes, and understanding the transportation-land-use nexus and spatial interaction, inter alia. These frameworks have been employed pervasively in planning analyses over the last half century and have supported collective decision-making or design which has led to collective action, the point of planning in the public domain. Arguably, the challenges facing spatial planners and urban designers today have become more daunting than those confronted several generations ago. The world in which planners endeavor to manage change is violently conflicted, fraught with widening inequalities, in the throes of environmental crises and significant demographic changes, and presents the challenge of managing dwindling common resource pools while coping with growing populations. At the end of an industrial epoch, cities and regions must free themselves from legacies of outmoded technologies and infrastructure systems, rebuild the decaying built environment, and transition to more sustainable lifestyles. The world is globalized (networked and

Worlds Lost and Found: Regional Science Contributions in Support of. . .

143

increasingly interdependent) and composed of inextricably intertwined complex adaptive systems. Both the seriousness and obviousness of our predicament has given rise to a new and busy area of scholarship in spatial planning: transition studies (Grin et al. 2010). These developments suggest that spatial planners and urban designers will continue to look to regional scientists to provide assistance to support decision-making, planning, and design. However the complexity of the problems now confronting spatial planners puts in question whether many of the models and methods contributed by regional scientists in the past, and based on strong simplifying assumptions, are up to the task of providing adequate planning support. The abstract “worlds” of these models and methods are perhaps worlds well lost. There is a growing appreciation in planning and design fields that effective responses to today’s urban and regional problems must reflect collective decisionmaking that leads to collective action. Collective decision-making, in turn, needs to be supported by analytical frameworks and processes that possess a number of critical features. These features include: • Shared ontologies of planning environments or an understanding of where working ontologies differ • Realistic characterizations of behavior and explicit characterizations of the endogenous determination of social structures • Formal analyses of subjunctive reasoning processes • Analyses of underlying causal mechanisms that can indicate where policy interventions are appropriate • Formal modeling of collective decision-making or design behavior to indicate what in the way of collective action is likely or feasible Allow me to motivate each of these features briefly before proceeding to the main program of the chapter. As Lewis Hopkins has pointed out in multiple places (e.g., Hopkins 2001), planning is a multivoiced activity. No one plans alone; plans not only identify contingent paths between interdependent decisions that can be taken by individual people and organizations but also possibly by many parties. In the public domain, plans inform decisions leading to collective action. With many people and communities making and using plans, some will conflict. Hence, it is often necessary to be able to share assumptions about what there is or could be – i.e., to compare ontologies. Many planning scholars have come to appreciate through, inter alia, advances in complexity theory, behavioral economics, network science, psychology, and the economics of social interactions that there are more types of actor than “homo economicus,” that social behaviors are influenced by context, and that decisions in different domains of people’s lives are not only interdependent but also condition subsequent decisions in other domains through path-dependent relationships. Credible support of collective decisions needs to reflect these realities. Much of planning and design work is of necessity based on “what-if” thinking that can be either prospective or retrospective. Most decisions leading to collective action being taken are based on scenarios presented to the public or relevant

144

K. P. Donaghy

decision-making bodies. Questions about the realism of scenario assumptions and the plausibility of projected outcomes are routinely and rightly raised, but also the validity of the reasoning in selecting assumptions and reaching conclusions should be queried. In cases of natural or man-made disasters, questions about whether or not outcomes could have been avoided or been different need to be addressed. Structural support is needed to conduct, examine, and validate subjunctive reasoning in the analysis of policies. To manage change effectively, planning practitioners need to understand realistically what causal mechanisms are at work, in addition to what the concerns of stakeholders may be, and what the nature of human behavior and the limits of our capabilities to manage our own affairs are. To explain factual matters acceptably and usefully to planning practitioners, designers, and policy makers, planning theorists and regional scientists need to reach adequate levels of granularity and causal depth in their theorizing (Donaghy and Hopkins 2006). It is now widely understood that, to be successful, planning or design interventions in urban and regional systems must have political support from the “bottom up.” Collective decision-making or design processes concerning changes in such systems are often structured to share expert knowledge, elicit ideas, and build consensus around action strategies. But often information about values, resources, and dispositions of different actors interested in actions is available and can, with appropriate modeling frameworks, be used to anticipate how the relationships between actors may evolve so as to identify possible or likely outcomes and indicate along what lines collective action may be feasible. In this chapter I present a survey of contributions made by regional scientists in recent decades that provide capacity to spatial planners and urban designers to engage in collective decision-making leading to collective action by bringing into view relevant aspects of the world – as it is and as it could be. In the next section, I offer an argument for why we need a richer, more broadly based, and more realistic ontology from what has traditionally been used in spatial planning. I follow this up with a discussion of what I believe is the most sophisticated of prevailing ontologies of planning analysis, that of Hopkins et al. (2005). Noting that this ontology is noncommittal on underlying social behavior, I discuss in the subsequent section contributions from behavioral economics that might be integrated into Hopkins et al.’s framework and then illustrate from the work of Ioannides (2013) how the determination of social structure can be endogenized in social network models. While the framework of Hopkins et al. (2005) as a planning data model travels a considerable distance toward supporting analyses of transition paths to alternative futures, I argue that we may also want to attempt to formalize analyses of subjunctive reasoning (in terms of possible worlds) and so discuss an important but neglected contribution by Isard and Lewis (1984). I then proceed to a discussion of causal analysis in general that draws on the work of Cartwright (2007) and consideration of an example of how such analysis can be conducted in a planning support system of Deal et al. (2017). In the chapter’s final substantive part, I briefly consider how Batty (2013) is modeling collective decision-making in urban design. I conclude the chapter with a summary of the major themes and points which I have elaborated.

Worlds Lost and Found: Regional Science Contributions in Support of. . .

145

Before proceeding further, it will be useful to distinguish between decision support systems and planning support systems. Decision support systems integrate subject-specific knowledge bases and analytical tools with inference engines to provide users with recommendations of decision sets and probable outcomes of decisions taken. Planning support systems go further to indicate how decisions taken at one time or place are related to or condition other decisions taken at other times and places.

2

Ontologies Adequate to the World in Which We Plan

An ontology is simply, in Willard Van Orman Quine’s (1953) phrase, an accounting of “what there is” or, as other philosophers put it, “the furniture of the world.” A social ontology is an accounting of such artifacts as conventions or rule-based practices, social innovations and institutions, norms, rights and obligations, intentions, designs, and, yes, plans. Of course, in planning, as noted above, one also needs to account for nonsocial things – the natural and built environments – which condition our plan-making and plan-using and other behaviors and our interactions with them. As John Searle (1995) has observed, there is at most one physical world, and we as physically constituted beings reside within it. As such beings, we have the remarkable biological capacity to make something symbolize – or mean, or express – something beyond itself. “This capacity underlies not only language but all other forms of institutional reality itself” (Searle 1995, p. 2). This capability underlies all of our rule-based social practices and enables us to constitute new institutional objects which exceed the physical features of the underlying physical reality. For Searle a “social fact” (read as object or entity) refers to any entity whose being involves collective intentionality or shared meaning. (Most basically, intentionality is “aboutness” or has to do with what a conscious person is thinking about. Searle refers to institutional objects – norms, conventions, practices, and language games – as “social facts.” With most linguistic philosophers, I will reserve the term “fact” for a true sentence or utterance, holding that it is the antics of objects – physical or social – that make our sentences true or false, hence facts or not.) Although we may distinguish between physical and nonphysical realities (or social and nonsocial reality), all of our experiences of reality (from early childhood onward) are linguistically mediated. All languages are social (there are no private languages). As social beings, we learn the world and language simultaneously and develop our social identities as we do so. However, the limits of our world are not the limits of our language. As pragmatistic philosophers have long suggested, we can learn about the physical and social worlds and obtain veridical knowledge – justified true belief – through error correction processes. While we need to avoid what Wilfrid Sellars has called “the myth of the given,” which denies the contributions that we bring to our encounters with reality, Susan Hurley cautions against committing the “myth of the giving,” which attributes too much construction of experience to the observer.

146

K. P. Donaghy

Significant contributions have been made to the study of social ontology in the last few decades by such scholars other than Searle as Hurley, Philip Pettit, and Elinor Ostrom, inter alia. Clearly there is much work to be done in accounting for such social objects as neighborhood effects, network externalities, and emergent properties of complex systems, with which spatial planners must be concerned. Much of our current understanding of social ontology owes to contributions of pragmatics, or the study of speech acts, in which one does something in saying something. In his book The Construction of Social Reality, Searle observes that “the great social theorists of the nineteenth and twentieth centuries lacked an adequate theory of speech acts, of performatives, of intentionality, of collective intentionality, [and] of rule governed behavior (Searle 1995, p. xii)” (all of which the explanation of planning behavior brings into play). I submit that to make better sense of collective action and to plan more effectively we need a richer ontology of social actors, institutions and institutional relationships, and relationships between people and things.

2.1

Ontologizing from Within Planning

As propaedeutic to laying out the elements of a planning support system (PSS) that can be used to explore whether and to what extent urban development plans made by different agents or communities are compatible (and can be used by those other than who made the plans to coordinate action), Hopkins et al. (2005) elaborate a planning data model (or PDM). Objects that exist in this model can be categorized as assets, actors, activities, and actions – what Kaza (2007) later terms the four As of planning. Figure 1 depicts “relationships among entities representing the world, entities representing changes in that world, and entities representing plans” (Hopkins et al. 2005, p. 602). The entire system depicted in Fig. 1 may be viewed as a “state of the world.” In this state, actors comprise persons, organizations (or groups of persons with roles, responsibilities, and decision rules), and populations (collections of persons or firms without organizational structures). Assets include buildings, networks (e.g., streets, electricity grids, and sewer systems), and areas designated for particular uses – so presumably the natural and built environments. Activities are practices in which actors engage, “such as residing, producing, consuming, or trip making” (Hopkins et al. 2005, p. 203). Activities are enabled by and conducted in, on, or with assets. Capabilities is a rather expansive category of entity that takes in characteristics of individuals (such as preferences, skills, or financial capabilities) as well as attributes individuals possess by virtue of their relationships to others (such as rights, responsibilities, authorities, and behavioral norms). Actions change assets and capabilities. Whereas assets are affected by investments (or presumably the lack thereof), which can create, destroy, expand, or contract them, actors’ capabilities are affected by learning, regulations, and transactions. In the situations Hopkins et al. consider: Plans are primarily about investments (changing assets) and changing capabilities. Actors make plans, perceive particular issues, make proposals for action, and have authority and influence in decision situations. In decision situations, actors use plans, confront issues and alternatives and make decisions for action. (Hopkins et al. 2005, p. 203)

Worlds Lost and Found: Regional Science Contributions in Support of. . .

147

Fig. 1 State of the world. (Source: Hopkins et al. 2005, reprinted with permission from SAGE Publications)

In providing the robust and well-populated ontology just sketched, which Kaza (2007) further refines and maps into a semantics that can be marked up in a standardized language to use in examining and comparing the contents of alternative plans, the authors have proceeded phenomenologically – describing from within planning practice a state of the world that is recognizable by planning practitioners, developers, public utilities directors, firms, residents, and other stakeholders or officials. To meet their objective of developing a planning data model, there is no need to exoticize what they are describing in the technical languages of social science or philosophy or to probe more deeply into the nature of the objects. And indeed the ontology they have provided will support scholarly conversations about relationships in the states of the world their model can describe. If, however, we ask questions other than those motivating Hopkins et al. especially questions about how we go about the business of managing transitions in, say, interdependent infrastructure systems – that feature in the capital plans of multiple jurisdictions – so as to achieve more sustainable lifestyles, we will want to investigate further the objects of this world and the relationships between them. In so doing, we will inquire after the behavior of actors and causal relationships; hence, we may find further philosophical probing and scientific inquiry useful.

148

3

K. P. Donaghy

Viewing Actors from the Perspective of Behavioral Economics and Complexity Theory

Hopkins et al. (2005) are noncommittal about what models of rational behavior should inform a PDM or PSS. If we are to bring actual human capabilities, limitations, and relationships – social capital and networks – more sharply into focus, it is perhaps better to introduce a more complex caricature of human behavior than is provided by homo economicus with his/her instrumental reasoning solely in the service of self-interest (Sen 1977). Behavioral economists Dawnay and Shah (2005) suggest the following principles as candidates for informing economic (and other) reasoning when engaged in planning or policy analysis: 1. Other people’s behavior matters: people do many things by observing others and copying; people are encouraged to continue to do things when they feel other people approve of their behavior. 2. Habits are important: people do many things without consciously thinking about them. These habits are hard to change. 3. People are motivated to “do the right thing”: there are cases where money is de-motivating as it undermines people’s intrinsic motivation. 4. People’s self-expectations influence how they behave: they want their actions to be in line with their values and commitments. 5. People are loss averse and hang on to what they consider “theirs.” 6. People are bad at computation when making decisions: they put undue weight on recent events and too little on far-off ones; they cannot calculate probabilities well and worry too much about unlikely events; and they are strongly influenced by how the problem/information is presented to them. 7. People need to feel involved and effective to make a change: just giving people the incentives and information is not necessarily enough (Dawnay and Shah 2005, p. 2). (It is interesting to note that regional scientists Olsson and Gale (1968) anticipated many of the suggestions concerning characteristics of economic reasoning made more recently by behavioral economists as well as the use of Markov chain models to represent spatial behavior, discussed below in Sect. 6.) We might add to these suggestions observations of Elinor Ostrom that people have a propensity to learn social norms – shared understandings about actions that are obligatory, permitted, or forbidden – that are analogous to their propensity to learn grammatical rules or the rules of language games (Ostrom 2000). Moreover, how rules and norms evolve, especially those pertaining to common resource pool sharing, can be explained through an indirect evolutionary research strategy that has been developed over a number of decades of careful case study research. A particularly important finding of Ostrom and her colleagues is that the behaviors of actors can be of different types. In addition to rational egoists, Ostrom and her colleagues have identified those who are conditional cooperators and willing punishers, all of whom are important for stable resource management regimes.

Worlds Lost and Found: Regional Science Contributions in Support of. . .

149

Kahneman (2011) has challenged fundamentally our notions of rationality and has shown how we are in fact oblivious to many of our own cognitive abilities and processes. Complexity theory has raised profound questions about how we understand social behavior and aggregate effects at different scales and how we view urban phenomena (Durlauf 2005). This branch of systems theory has also given rise to new approaches to computer-based learning about complex adaptive systems which make intensive use of agent-based models. Findings of complexity theory have stood on its head some prevailing criticism of recent decades. For example, macroeconomists of the representative agent school criticized Keynesians for using models that lacked micro-foundations – i.e., models whose equations could not be derived from a representative agent’s constrained optimization problem. But a major point of the study of complex adaptive systems is that emergent phenomena simply do not have micro-foundations. Behavioral phenomena at higher scales are qualitatively different from behaviors aggregated up from lower scales. Empirical methods have not fallen behind these theoretical developments. Yannis Ioannides’s (2013) pathbreaking volume, From Neighborhoods to Nations: The Economics of Social Interaction, collects important results from literatures on the economics of nonmarket-based social interactions, geo-informatics, and spatial econometrics to demonstrate how social constructs of contemporary society such as evolving neighborhood effects can be observed, modeled, and have hypotheses about them confronted empirically. Significant developments in the study and operationalization of social capital and social networks have also been made. We consider now Ioannides’s modeling of the endogenous formation of social structure.

3.1

Endogenizing Social Structure

Ioannides uses the conventions of graph or network theory to develop a dynamic model of individual decision-making, in which the individual is influenced by actions of others nearby. To recapitulate the development of this model, we will start with a given social structure and then address the deliberate choice of connections. The behavioral model we will consider can accommodate a broad range of instances of social interactions. These include consumption activities, human capital accumulation, and income-generating activities that may be subject to neighborhood and peer effects.

3.1.1 Assumptions • Individual i derives utility Uit from an activity yit in period t and incurs costs in terms of utility for maintaining his or her social connections. • Individual i’s utility function Uit is increasing and concave in yit. • Uit depends on the actions and the characteristics of individual i’s neighbors (in the sense of social structure) and in the mean (average or expected) action and characteristics of the entire local economy.

150

K. P. Donaghy

3.1.2 Additional Notation yt = (y1t, y2t, . . ., yIt)T is a column vector with I elements describing the actions of all I individuals in the local economy at time t. xi is a row vector with k elements describing individual i’s own characteristics, such as the number and ages of children in his or her household. Matrix X is an I  k matrix, which stacks characteristics for all individuals. xvt ðiÞ is a row vector with k elements describing the mean characteristics of i’s neighbors, j  vt(i), where vt(i) denotes the set of i’s neighbors at time t. xvt ðiÞ is computed as xvt ðiÞ ¼ jvt1ðiÞj Γit X, in which Γit is an II adjacency matrix. xvt ðiÞ represents the influence of the aggregate characteristics of individual i’s neighbors – e.g., mean income, age, education, and other demographic and life-cycle characteristics. These characteristics can be either a matter of taste directly – people socializing with people similar to them – or indirectly, because such characteristics are determinants of other variables of interest. ι(iota) denotes a column I-vector of i’s. I is the I  I diagonal matrix of i’s, and ϕ, θ, α, and ω are column vectors of parameters with k elements. 3.1.3 Model Development Ioannides assumes that individual i at time t enjoys socializing with other individuals with specific characteristics. He conveys this assumption through an expression for i’s taste for the average characteristics of his or her neighbors, j  νðiÞt , jνt1ðiÞj Γi,t X ϕ, which he calls a level local contextual effect. The marginal utility of one’s own activity also depends on one’s own neighbors’ mean characteristics and is represented by the term jνt1ðiÞj Γi,t X θyit , which he refers to as the marginal local contextual effect and separately on one’s own characteristics through the term xiαyit. Turning first to interactions that are symmetric, an individual i at time t can be affected by his or her neighbors’ actions in two ways. One may be thought of as conformism: individual i suffers disutility from a gap between his or her own current action and the mean action of his or her neighbors in the previous period, yit  1 jνt1 ðiÞj Γi,t yt1 : Ioannides observes that this formulation allows the effect of neighbors’ actions to depend on individual socioeconomic characteristics. It expresses effects of endogenous local interaction. If the adjacency matrix Γit is modified to allow for loops, i.e., γ ii 6¼ 0, one can introduce own-conformity or a habit formation effect such as when one dislikes changing one’s own actions from period to period. Ioannides also allows for a global conformity effect through the gap between an individual’s current action and the mean action among all individuals in the previous period, yit  yt1 : This gap is intended to capture endogenous global interactions, albeit with a lag. An individual i incurs a utility cost for maintaining his or her connection with individual j at time t, which is a quadratic function of the “intensity” of the social attachment, or of the synergistic effort, as measured by γ ii, that is, c1 Γi,t ι  c2 Γi,t Γ0i,t , where c1 , c2 > 0: (Ioannides observes that this component of the utility function is critical when weighted adjacency matrices are considered.)

Worlds Lost and Found: Regional Science Contributions in Support of. . .

151

Finally, Ioannides allows for an individual’s own marginal utility to include an additive stochastic shock, eit. Pulling together all these effects in a quadratic utility function, one obtains 1 0 level local contextual effect

zfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflffl{ 1 Γ Xϕ j νt ðiÞ j i,t

U it ¼

B B B B þB Bα0 þ B B @

C C C Cyit C C C A

marginal local contextual effectC

zfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{ 1 Γ X θ þ ei,t þ j νt ðiÞ j i,t

xi α |{z} marginal utility of own characteristics

disutility from local conformism

zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{  2  1 1 1 1 2 2 Γ y  1  βg  βl yit  βg ðyit  yt1 Þ  βl yit  2 2|fflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflffl} 2 j νt1 ðiÞ j i,t t1 

global conformity effect marginal effect of neighbor0 s attributes dependent on individual0 s own characteristics

þ

zfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflffl{ 1 Γ y xω j νt ðiÞ j i,t t1 i



c1 Γi,t ι  c2 Γi,t Γ0i,t |fflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflffl}

ð1Þ

utility cost of maintaining connections

Terms on the right-hand side that do not involve yit do not affect individuals’ reaction functions. The corresponding social effects are relevant nonetheless for the endogenous neighborhood choice process which rests on total utility comparisons.

3.1.4 A Dynamic Model of Individual Decision-Making Utility function (1) can be used to model decision-making of individuals that is influenced by considerations of neighbors’ actions. At the beginning of period t, individual i augments his or her information set ψ it1 with information about his or her own contemporaneous preference shock, eit, and with public information that has become available during period t–1. This information includes the period t adjacency matrix Γit, which is taken to be exogenous; his or her neighbors’ past actions, Γi,t yt1; and the mean lagged action for the entire economy, yt1 : He or she chooses an action plan, {yit, yit + 1, . . .| ψ it} so as to maximize the expected lifetime utility conditioned on ψ it: ( E

X s

) st

δ

U i,s jψ it ,

ð2Þ

where δ, 0 < δ < 1, is the rate of time preference and individual utility is given as above in Eq. (1). Ioannides observes that individual i recognizes that yit enters his or her next period utility Ui,t+1 only if βg 6¼ 0 – that is, only if endogenous global

152

K. P. Donaghy

interactions are present. This reflects the fact that there is no habit formation effect, γ ii, t+1 = 0, 8t, in which case the term Γi,t+1yt does not contain yit. The first-order condition (FOC) with respect to yit given the social structure is yit ¼



β g  βl Γi,t yt1 þ βg yt1 þ δ E yi,tþ1  yt jψ it þ α0 þ xi α1 þ Γi,t X θ þ eit : I j νt ðiÞ j ð3Þ

The interdependencies between individuals’ reaction functions become clearer when the FOCs for all individuals are put in matrix form:

 βg βg βg I þ δ I yt ¼ At þ βl At þ I yt1 þ δ E ytþ1 jψ t þ et , I I I

ð4Þ

where I is an I  I matrix of 1’s and At = (. . ., α0 + xiα + Γi,tXθ)T is a column I-vector of individual contextual effects: At  Dt Γt ,

ð5Þ

in which Γt is the row-normalized adjacency matrix, an I  I positive matrix, and Dt is an I  I diagonal matrix in which the elements are the inverses of the sizes of the individual neighborhoods Dii,t ¼ jvt1ðiÞj ¼ ðΓt 1Γt Þ : ii Defining the conformity effect relative to the mean action of an individual’s neighbors in the current period, yit  jvt1ðiÞj Γi,t yt , we can modify the system as

 βg βg βg I þ δ I yt ¼ At þ βl At yt þ Iyt1 þ δ E ytþ1 jψ t þ et : I I I

ð6Þ

The simultaneous presence of the vectors of current and lagged individual decisions, yt1, yt, and the expectation of future ones, E[yt1| ψ t], complicates dealing with this model. If both βl and βg are zero, all social effects are absent, and the vector of individual outcomes simply equals individuals’ own effects plus the contemporaneous random shock – xiα + eit. Otherwise, in the presence of social effects, the LHS and RHS of Eq. (4) represent marginal lifetime utility costs and benefits for all individuals derived from an additional unit of their respective outcomes. An additional unit of yt increases marginal costs now by yt and next period via the conformism effect, which when β discounted to the present is in aggregate δ Ng Iy: Setting βg = 0, and thereby excluding global interactions, the system of equations determining the endogenous variables becomes yt ¼ At þ βl At yt1 þ et et  ðe1t , e2t , . . . , eIt Þ is the I  vector of shocks:

ð7Þ

Worlds Lost and Found: Regional Science Contributions in Support of. . .

153

The evolution of the state of the economy defined by yt, the vector of individuals’ actions, is fully described by Eq. (7), which is a VARX(1, 0) model – i.e., a vector autoregressive model with exogenous variables, a one-period lag in the endogenous variables, and no lags in the exogenous variables – given the information set ψ t = Uiψ i,t and provided that the sequence of adjacency matrices is specified. The economy evolves as a Nash equilibrium system of social interactions that adapts to external shocks of two types: deterministic ones, as denoted by the evolution of social structure, and stochastic ones as denoted by the vectors of shocks, et. Ioannides discusses how interactions in the steady state evolve and applies the model to the case where individuals respond to incomes of others. He also explores the question of whether or not an individual may want to interact socially with others and the question of how network connections come about and so begins to break open black boxes that heretofore have been closed to planning analysts. Bracketing their point about people being bad at computation, one can see how Ioannides’s model can accommodate many of the other principles of economic reasoning observed by Dawnay and Shah (2005).

4

Formal Analysis of Subjunctive Reasoning

While Ioannides and others have shown us how to model and test theories about complex social interactions and their effects, as regional scientists working in the service of planners and designers, we would also like to pursue realistic comparative dynamic analyses and assess the validity of our thought experimentation, some of which concerns possible worlds. Hopkins et al. (2005) and Kaza (2007) are concerned with alternative futures, as are most planners and plans. The subject of “possible worlds,” however, has to do with the truth of counterfactual statements about alternative futures (but also about alternative pasts and presents) and the validity of logical inferences about possible worlds. Possible worlds are defined as “maximally possible states of affairs,” whereas a ‘state of affairs’ is a relationship that either ‘obtains’ or doesn’t. Alternative ‘states of the world’ that can be entertained within Hopkins et al.’s PDM are also states of affairs. The question of possible worlds, not just alternative futures, arises in the context of planned transitions. We want to know about not only the status of future states of affairs in plans, in the sense of feasibility or compatibility with other states of affairs, but also how sound our reasoning about them is. I would argue that to make progress in answering this question we need two developments: 1. A planning support system that would enable stakeholders to play out sequences of decisions, which are related to other decisions made by themselves and other stakeholders, and to observe the systemic implications of these decisions, enabling them to learn how easy or difficult it may be to achieve some intended transition – how reachable some desired social state of affairs may be

154

K. P. Donaghy

2. A dynamic truth-functional modal logic engine to establish the truth values of propositions about future states of affairs and transition paths to them Regarding the first development (or desideratum), it is evident that some decisions’ outcomes are contingent upon others’ and that path dependence plays a critical role in conditioning what states of affairs can be realized. This point is trivially obvious to any land-use planner or developer in the context of a neighborhood development, but it also becomes obvious when one adopts a higher, broader, and longer-term perspective of the evolution of regional systems – e.g., a region afflicted with “Dutch disease” or the so-called resource curse. Budget constraints and other political realities (lack of political will, conflicting values, legislative gridlock, etc.) also constrain what can be done (say, in a budget cycle) leading many to conclude that “we can’t get there from here.” Moreover, it is not uncommon for actors who have set out upon some course of actions to not follow through, not just for reasons of akrasia, or weakness of the will. Time-inconsistent policies may make perfect sense from the perspective of some participants, e.g., the leaders, in dynamic Stackelberg (leader-follower) games. Regarding the second development (or desideratum), we are interested in determining (often challenging) the truth values of propositions about what would happen under different plans in alternative futures. Strong claims are advanced every day by parties interested in the advancement of pet projects – e.g., gas companies wishing to drill in rural areas of the Southern Tier of New York State, companies with stakes in the Keystone pipeline from Canada to southern US ports, or proponents and opponents of genetically modified crops. We raise, entertain, and challenge claims not only about the feasibility, safety, and systemic implications of states of affairs but also about the validity of the reasoning in arguing for them. We would like to check both types of claims in what Robert Brandom (1994) has termed the “score-keeping” of our normative pragmatics. Some logicians, such as Quine, have been skeptical about what the introduction of modal logical operators contributes to our assessment of the truth of modal statements. Much of the controversy about possible worlds has had to do with argumentation about counterfactual suppositions. An example is the question of “what might have been the role of the U.S. military in the Southeast Asian conflict of the 1960s and 1970s if the Gulf of Tonkin incident had not been staged?” Clearly, we are interested in asking questions about both alternative futures, pasts, and presents and possible worlds, even if some believe such questions defy determinative answers. My sense is that Kaza’s (2007) work on substitutability of alternative plans (as designs) enables us to get at suitably well-defined questions about alternative futures as a first approximation. The challenge is to be able to play out multiple rounds of decision-making in which it is (may be) assumed that learning by actors occurs and that the decisions made close some options while leaving others open (or while creating others). In some cases, implementing decision tree analysis may be helpful, but as such authors as Stokey and Zeckhauser have admonished us in their discussion of the methodology, it is not helpful in situations where alternatives are not sharply defined and are complex and probabilities of outcomes are themselves endogenous.

Worlds Lost and Found: Regional Science Contributions in Support of. . .

155

So Hopkins et al. (2005) and Kaza (2007) give us frameworks that enable us to consider comparisons of alternative built-out designs. An interesting question is whether we can build upon their work, incorporating behavioral economic characterizations of rationality and Ioannides’s (2013) innovations in modeling neighborhood effects, so as to be able to compare dynamic trajectories and make supportable judgments about which states of affairs are reachable.

4.1

Subjunctive Reasoning in Policy Analysis

In 1985 Walter Isard and Blane Lewis published a paper entitled “James P. Bennett on subjunctive reasoning, policy analysis, and political argument.” As the paper’s title suggests, Isard and Lewis were digesting ideas from a longer, unpublished work by Bennett on how subjunctive reasoning is deployed in political argumentation, especially in policy analysis. The policy question that Bennett used to illustrate his points was no-first-use of nuclear weapons by the United States and the different arguments about this policy advanced, respectively, by McGeorge Bundy, George F. Kennan, Robert S. McNamara, and Gerard Smith (in the policy journal Foreign Affairs) on the one hand and Kurt Gottfried, Henry W. Kendall, and John M. Lee (in Scientific American) on the other. Among the key points made by Bennett and underscored by Isard and Lewis were the following: 1. In doing policy analysis, we must conduct subjunctive reasoning, defined to be reasoning about counterfactual situations or things that might have been true under different conditions. It embraces “what-if” thinking. Subjunctive reasoning is unavoidable. 2. Formal logic provides little support for describing how political argument, based as it must be on subjunctive reasoning, is conducted or for normatively guiding how it should be done. 3. One type of logic related to possible worlds semantics is suggestive for both descriptive and normative applications. But this type of logic typically ignores the central feature of policy prescription – the construction and evaluative probing of the internal structure of possible worlds and the courses of action by which they may be reached or accessed. 4. Some of the best examples of policy advocacy make liberal use of or refer to external textual materials, such as histories, stories, and myths. 5. When policy advocates make explicit assertions, they rely upon pragmatic and syntactic presuppositions that they make and that they anticipate their readers will make, to establish coherence of their arguments and to refer to external materials. 6. Features 2–4 suggest that adequate representation of subjunctive reasoning in political argument must utilize: (a) An approach to possible worlds that is constructivist, i.e., that shows in detail the realistic or substantively meaningful steps or ways by which one transitions from the present world to another world (in a word, plans)

156

K. P. Donaghy

(b) Extended types of textually oriented data, which are simply not numbers, but which include narratives that organize the reader’s understanding of the political process and events (c) A significantly enriched model or argument in which formal analysis is applied to the relevant quantitative and qualitative data after implicit assumptions, cause-effect relationships, and implicit action output processes have been stated and supported 7. Careful study of how policy advocacy is accomplished under conventional criteria is necessary to develop an adequate language (lexicon, syntax, semantics, and limited pragmatic features) for modeling political argument. The argument of Isard and Lewis is interesting and relevant to regional scientists in part because it contains many points that Isard would return to later in his valedictory comments about how the field of regional science should advance methodologically. But it also represents a road not taken (or taken further) by either regional scientists or peace scientists. The need for establishing higher standards for counterfactual analyses of (or subjunctive reasoning about) regional problems and appropriate policy responses has become increasingly apparent, however, as criticisms of integrated assessments of impacts of climate change; increasing social and economic inequality; expanding regional political, military, and refugee crises; and the unwinding of globalization, among other problems, have mounted. Isard and Lewis (after Bennett) identify strengths but also limitations of possible worlds semantics in such cases when pursued without adequate institutional context. Alternatively, counterfactual analyses based on extrapolations from historical institutional content alone raise questions about the validity of the inferences drawn. We want to know about not only the status of future states of affairs in policies and plans, in the sense of feasibility or compatibility with other states of affairs, but also how sound our reasoning about them is. Clearly, it would seem we need to work simultaneously within what Sellars termed the “space of causes” and the “space of reasons.” Isard and Lewis’s arguments (based on Bennett’s) suggest that to make progress in answering questions of feasibility and logical validity we need to analyze causality and counterfactuals to gain deeper understandings of causal relationships that are critical to the management of transitions in spatial systems. Their arguments also suggest that (as noted above) we need planning support systems to enable stakeholders to learn about how reachable some desired social state of affairs may be and a dynamic truth-functional modal logic engine to establish the truth values of propositions about future states of affairs and transition paths to them.

5

Causal Analysis

Certainly the analysis of causal mechanisms at work in the physical and social worlds will be critical to determining what types of collective action are appropriate to take in responding to socioeconomic problems in spatial contexts. One of the most highly regarded philosophers of science writing about causality and counterfactuals today is

Worlds Lost and Found: Regional Science Contributions in Support of. . .

157

Nancy Cartwright. She is both a “realist” and a “particularist.” About causation Cartwright writes “. . . there is great variety of different kinds of causes and that even causes of the same kind can operate in different ways . . . [Hence] the term ‘cause’ is highly unspecific. [Its use] commits us to nothing about the kind of causality involved nor about how the causes operate. Recognizing this should make us more cautious about investing in the quest for universal methods for causal inference” (Cartwright 2007, 2 pp.). In her book, Hunting Causes and Using Them, she examines closely the accounts of causal laws (regularities) that are dominant today. These include: • Probabilistic theories of causality and consequent Bayes-nets methods of causal inference or analyses based on “Granger causality” • Modularity accounts • Invariance accounts • Natural experiment accounts • Causal process theories • Efficacy accounts While each of these approaches has something (even much) to recommend it, she argues that each of them faces difficulties that prevent them from being accepted as a universal approach. Cartwright argues instead for “thick causal concepts” relevant to particular situations. And since the word “cause” – while essentially meaning to bring about some kind of effect – has so many instantiations and substitute terms, formalization of particular causal arguments is important for rigorous analysis. She further argues that many counterfactual analyses conducted in economics (or the social sciences, more broadly) today do not provide genuine answers to “what if?” questions about policies – especially randomized control trials (RCT). They provide instead tools for measuring causal contributions in very specific types of situations – “Galilean experiments” in which only one factor (or a few) may be changed against a ceteris paribus background. The upshot of her argument is that we may (even must) use sophisticated analytical tools but must be savy about context and causal relations of specific states of affairs we study. Influenced strongly by the work of philosopher of science, Richard Miller, Jochen Runde (1998) goes further to argue against the search for lawlike causal regularities in economics – and by extension planning and regional science – and in support of identifying causal mechanisms that bring about particular events or classes of events. Like Miller, Runde recognizes that establishing a credible account of causality relative to a specific audience is a social accomplishment realized within a discipline through fair comparison of competing explanations; it is not an outcome that can be achieved algorithmically.

5.1

Example of Causal and Counterfactual Analysis in a Planning Support System

Causal relationships characterized by planning support systems (PSSs) are presently being investigated via multidirectional temporal analysis. One example is provided by Deal et al. (2017). The authors of this manuscript argue that PSS-based scenario

158

K. P. Donaghy

planning processes and outcomes might be improved by including multidirectional temporal analysis, because methods of such analysis enable users to see more deeply into the potential consequences of scenarios considered. Deal et al. demonstrate the benefits of multidirectional temporal analysis with comparisons of actual and simulated population development patterns obtained under different scenarios they examine in simulations for McHenry County, Illinois, using the LEAM planning support system, which embodies a very large spatialdynamic cellular-automata model. They consider four types of temporal analysis: forecasting, backcasting, recasting, and pastcasting (see Fig. 2): (i) Forecasting is the most common approach taken in scenario planning. A typical forecast starts from a (near) current condition and projects to a future state – this usually refers to the land-use changes that might occur over some specified time period. (ii) Backcasting is the reverse version of forecasting – the model starts from a future state and draws a developmental path back to the current condition. This is useful for plotting a path that responds to “how do we get there” questions. (iii) Recasting is a reconstruction of the present. It uses forecasting techniques that start from a condition set in the past and project to the current state, usually for comparison purposes (from a projected current state to the actual state). This type of analysis is useful for calibration purposes and understanding a previously unforeseen condition that emerges in the present state. (iv) Pastcasting starts from a current time point (again, not necessarily the current state; it may often be a virtual, more preferred “current” state) and draws a developmental path back to a previous point in time. This approach is useful for understanding the processes that took place (or should have taken place) in order to arrive at the current or virtual state (Figs. 3 and 4).

Fig. 2 Multidirectional analysis using forecasting, backcasting, recasting, and pastcasting to and from the present or current condition. (Source: Deal et al. 2017, reprinted with permission from Elsevier)

Worlds Lost and Found: Regional Science Contributions in Support of. . .

159

Fig. 3 Forecasting and backcasting. (Source: Deal et al. 2017, reprinted with permission from Elsevier)

Fig. 4 Pastcasting and recasting. (Source: Deal et al. 2017, reprinted with permission from Elsevier)

160

K. P. Donaghy

As with any type of thought experimentation, the issue of validation of PSS-generated experimental findings (especially counterfactual findings) must be addressed. The authors note that adopting multiple temporal perspectives may help us to identify what may be missing from a model and suggest that usefulness of scenario analyses trumps accuracy in the case of planning, but they do not address the issue of thought experiment validation head-on. (This is not a criticism, since this is not what they set out to accomplish.) We should note, however, that even accurate forecasts (or pastcasts) may not validate the findings of a thought experiment if the PSS employed does not embody a reasonable characterization of the causal mechanisms that are bringing about the developments of interest. (The most accurate vector autoregressive, or VAR, models commonly used in econometric forecasting tell us nothing about the underlying structures of the economies they “represent.”) Whether or not one has the right causal mechanisms in a model to support planning analyses of alternative development paths must be argued for on grounds other than forecast accuracy. And it will be difficult to clinch the case for multidirectional temporal analysis on the grounds of usefulness alone. (One might reasonably argue that the exercise of planning commissioners sitting around a Ouija board is also useful, because it leads them to consider eventualities they otherwise might not have!) Too often in policy and planning environments, people purchase the advice they consider “useful” in the sense of “wanting to hear.” If we wish to use a PSS to explore instructively deviations from a path of development – whether prior, current, or future – we also need to be able to say which modeled deviations are plausible, which statements about possible states of affairs are true. Recent developments in dynamic modal logic would appear to be available to help us assess arguments where we are able to code them into the right analytical engine (see, e.g., Galliani 2013).

6

Modeling Collective Decision-Making in Design Behavior

Much of Michael Batty’s recent book, The New Science of Cities, is intended to convey what has been accomplished over the last few decades in modeling complex adaptive systems behaviors over hierarchical urban networks and illustrating how tools for modeling such behaviors can be incorporated into decision support or planning support systems. In chapter “A Theory for Collective Action” of Part III of his volume, however, Batty discusses how modeling the design of urban systems by different agents can be used to support creation and analysis of plans and policies for communities. Batty is concerned with design and decision-making as processes of resolving conflict or reaching consensus between relevant stakeholders that will enable collective action to be taken. To model the evolution of relationships between actors that can lead to collective action, Batty draws on conceptual and mathematical tools introduced by the sociologist James Coleman. Coleman’s (1964) theory of collective cation is based on a set of assumptions relating actors to factors, resources, and interests:

Worlds Lost and Found: Regional Science Contributions in Support of. . .

161

. . . each actor j ascribes a certain value Vl to the degree of control clj possessed over the factor l. The actor is able and willing to influence events according to his or her interest in any factor xjl and according to available power or resources rj. In this context, power is no longer a relational concept between actors but a measure of influence based on an actor’s ability to act by using his or her resources. Coleman argues that in a system of perfect exchange, actors will move to an equilibrium in which the value of their control over a particular factor is equal to their resources, which are allocated to that factor in proportion to their interests. Such a situation can be brought about in several ways: by altering control, by altering interests, or by altering factor values or resources through social exchange. (Batty 2013, p. 373)

The relationships are depicted in Fig. 5.

Actors

Problems

power over problems

interest f3

control f4

power over policies

significance of policies

Policies

interest f1 importance of problems control f2

relevance f5

Fig. 5 Structure of the decision-making system. (Source: Batty 2013, reprinted with permission from the MIT Press)

value of factors

Factors

Batty elaborates how a formal characterization of the relationships at the core of Coleman’s theory can be developed using Markov chains and how the equilibrium to which the model converges is suggestive of a course of collective action. He provides illustrations of applications of this modeling approach to inner-city poverty and resource allocation in a local authority. If pursued further, this work would seem to have a great potential for identifying promising formulations of projects around which support might converge.

7

Conclusion

In this chapter I have presented a survey of contributions made by regional scientists in recent decades that provide capacity to spatial planners and urban designers to engage in collective decision-making leading to collective action by bringing into view relevant aspects of the world – as it is and as it could be. I began by identifying

162

K. P. Donaghy

critical features of analytical frameworks supporting collective decision-making for collective action. I offered an argument for why we need a more complete ontology than what has traditionally been invoked in spatial planning and discussed the sophisticated applied ontology in the planning data model of Hopkins et al. (2005). Noting that this ontology is noncommittal on underlying social behavior, I further discussed contributions from behavioral economics that might be integrated into Hopkins et al.’s framework and illustrated from the work of Ioannides (2013) how the determination of social structure in space can be endogenized. While acknowledging that the framework of Hopkins et al. (2005) supports analyses of transition paths to alternative futures, I argued for formalizing analyses of subjunctive reasoning, drawing on an important but neglected contribution by Isard and Lewis (1984). I then proceeded to a discussion of causal analysis in general and considered an example of how such analysis can be conducted in the planning support system of Deal et al. (2017). In the chapter’s final substantive section, I briefly considered how Batty (2013) is modeling collective decision-making in urban design. Much of my motivation for focusing on the set of issues considered in this chapter derives from my involvement in a study group seeking to understand how a transition to a more sustainable energy system on a regional scale in the Northeastern United States can be facilitated. Transitions of this sort, which are societal in scope and ring changes through industries and communities, cannot be imposed from above – that is to say, socially engineered. If behavioral economists and political scientists are correct, such transitions must be brought into being by the collective action of multiple governments at multiple levels, agencies, firms, utilities, households, and environmentalists, i.e., many stakeholders. My intuition is that the planning work associated with facilitating historic changes of this sort that need to happen can be enabled by planning support systems and planning data models that can be employed to explore the compatibility of different plans as designs and strategies and explore the reachability of certain states of affairs in communities and regions. As I hope this chapter has made clear, regional scientists have a significant role to play in supporting this important work.

8

Cross-References

▶ Geospatial Analysis and Geocomputation: Concepts and Modeling Tools ▶ Regional Science Methods in Planning ▶ Spatial Interaction Models: A Broad Historical Perspective ▶ Spatial Network Analysis

References Batty M (2013) The new science of cities. The MIT Press, Cambridge, MA Brandom RB (1994) Making it explicit: reasoning, representing, and discursive commitment. Harvard University Press, Cambridge, MA

Worlds Lost and Found: Regional Science Contributions in Support of. . .

163

Cartwright N (2007) Hunting causes and using them. Cambridge University Press, Cambridge Coleman JS (1964) Introduction to mathematical sociology. The Free Press, New York Dawnay E, Shah H (2005) Behavioural economics: seven principles for policy-makers. New Economics Foundation, London Deal B, Haozhi P, Timm S, Pallathucheril V (2017) The role of multidirectional temporal analysis in scenario planning exercises and planning support systems. Comput Environ Urban Syst 64:91–102 Donaghy KP, Hopkins LD (2006) Coherentist theories of planning are possible and useful. Plan Theory 5:173–202 Durlauf S (2005) Complexity and empirical economics. Econ J 115:F225–F243 Friedmann J (1988) Planning in the public domain: from knowledge to action. Princeton University Press, Princeton Galliani P (2013) The dynamification of modal dependence logic. J Logic Lang Inference. [published online 14 May 2013] 22(3):269–295 Grin J, Rotmans J, Schot J (2010) Transitions to sustainable development: new directions in the study of long term transformative change. Routledge, New York Hopkins LD (2001) Urban development: the logic of making plans. Island Press, Washington, D.C. Hopkins LD, Kaza N, George Pallathucheril V (2005) Representing urban development plans and regulations as data: a planning data model. Environ Plann B Plann Des 32:597–615 Ioannides YM (2013) From neighborhoods to nations: the economics of social interactions. Princeton University Press, Princeton Isard W, Lewis B (1984) James P. Bennett on subjunctive reasoning, policy analysis, and political argument. Confl Manag Peace Sci 8:71–112 Kahneman D (2011) Thinking fast and slow. Farrar, Straus, and Giroux, New York Kaza N (2007) Reasoning with plans: inference of semantic relationships among plans about urban development. Dissertation, University of Illinois at Urbana-Champaign Olsson G, Gale S (1968) Spatial theory and human behavior. Papers Reg Sci Assoc XXI:229–242 Ostrom E (2000) Collective action and the evolution of social norms. J Econ Perspect 14 (3):137–158 Quine WV (1953) On what there is. In: Quine WV (ed) From a logical point of view. Harvard University Press, Cambridge, MA, pp 1–19 Runde J (1998) Assessing causal explanations. Oxf Econ Pap 50:151–172 Searle JR (1995) The construction of social reality. The Free Press, New York Sen AK (1977) Rational fools: a critique of the behavioral foundations of economic theory. Philos Public Aff 6(4):317–344

Section II Location and Interaction

Travel Behavior and Travel Demand Kenneth Button

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 The Behavior of Individuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Modeling Travel Behavior and Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 The Elasticity of Travel Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Using Travel Behavior and Travel Demand Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

168 168 169 175 180 183 184

Abstract

This chapter focuses on the ways in which travel behavior and demand are analyzed within the framework of regional science. Unlike several recent surveys that cover the more technical and abstract aspects of mathematically modeling travel behavior and demand, the attention here is more on the practical aspect of applying travel behavior and demand analysis to subjects such as regional development, infrastructure investment, and congestion analysis. Thus, while the main methods of modeling travel behavior and demand are outlined and critiqued, there is also considerable references to such things as demand elasticities and their estimation that are at the core of applied regional analysis. These types of parameter provide a direct link between a soft policy shift or a harder infrastructure investment, travel behavior, and ultimately the implications of this for regions. There is also discussion of the uses made of the forecasts that are the de facto rationale for studying travel behavior and travel demand, and the ways that neutral forecasting can be manipulated in decision-making.

K. Button (*) Schar School of Policy and Government, George Mason University, Arlington, VA, USA e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_46

167

168

K. Button

Keywords

Public transportation · Travel behavior · Travel demand · Toll road · Class fare

1

Introduction

Whatever we do takes up time; it takes time to watch a movie, to appreciate a good meal, or to see a favored soccer team lose. Time is important; why else do fast-food restaurants flourish? We like to save time to allow us to do more things over our limited life spans and in particular because there is considerable uncertainty about the latter’s duration. In a more prosaic context, we often try to save time to increase our economic productivity, or perhaps our employers do, because, after all as the old proverb says, “time is money.” Time-saving is important in influencing travel behavior and on travel demand, as in most other activities, sometimes more so and sometimes less. The relationship is, as in all other similar matters, a complicated one because from a behavioral perspective, time is never consumed in isolation. Just as going to the opera involves consuming time, it also means spending money. Traveling thus costs money in several ways, be it a fare, fuel, or shoe leather, and involves some form of final consumption, and not just at the end of a journey, although this is often emphasized. In this sense, the demand for travel is also derived from what is to be “consumed” at the end of a trip, be it a work or leisure final activity; in economic terms time is a joint product. Travel, therefore, entails considering the consumption not just of time but also of money and other things as well as enjoying a number of benefits at the end of the trip, in some cases, reading, sightseeing, listening to the radio, or enjoying having your Lamborghini admired on the trip. It is not surprising therefore that in practice, transportation analysts are often very bad at predicting travel behavior, particularly when there are major changes in underlying parameters over time or when some of the costs and benefits are not easily quantified.

2

The Behavior of Individuals

The focus here is exclusively on the travel behaviors of individuals; it does not consider the movement of goods or information electronically, other than those in cases when these may impact on personal movement. The overlap can, however, be rather more extensive than is often reflected in policy-making and academic analysis. In most cases, individuals accompany goods when they move, trucks would be immobile without truck drivers, and the personal involved have their own individual traits. In many cases, goods movements use the same infrastructure as individuals or the same piece of mobile plant, for example, passengers on the top deck of an aircraft and freight in the belly hold. The electronic transportation of information is important when it substitutes for a personal movement, for example, in the context of teleworking, or when it facilitates personal travel, as with airline computer booking

Travel Behavior and Travel Demand

169

systems or with automobile route guidance systems. We also say relatively little about where origins and destinations of trips are located, but there are clear links between where people choose to live and work and transportation facilities that in turn feed back on their travel behavior. Additionally, handbooks have many purposes and many potential readerships ranging from those concerned with the highly theoretical and abstract to those wanting a quick guide to finding a practical, rough-and-ready solution to an immediate real-world problem. (Volumes dealing with the latter are often, with a degree of derision, described as manuals, but their approach is no different to “handbooks” providing recipes for developing abstract theorems.) Here we seek to function in the middle ground, by both discussing many of the widely used practical approaches to analyzing travel behavior and travel demand and making use of our knowledge of the subject, but also to offer some guidance as to the direction research at a more theoretical level is moving. While there will be some discussion of model derivation, in the spirit of handbooks, the emphasis is on setting down what we have and is currently being used, rather than plowing through all of the intellectual and mechanical background in detail.

3

Modeling Travel Behavior and Demand

Travel behavior is like any other activity, possibly excluding religion and politics, in that rational economic forces largely drive it. People base their decisions on the benefits they will enjoy from it, either directly or at the end of the trip, constrained by the generalized costs of the movement in terms of time and money, relative to available resources and the opportunity costs of using them in some other activity. Put this way, the study of travel behavior may seem rather trite; it is basically a constrained maximization problem. The devil, as is often the case, lies in the detail. At the outset, it is important to distinguish between travel behavior and travel demand. The latter is a particular influence on the former. Travel behavior is what people actually do, the way that they behave, the trips they make, and the forms of transportation that they use. It is basically what the collection of transportation data on person and goods movements reflects and what those who invest in transportation or manage transportation assets try to forecast. Travel demand is one part of this and reflects what people’s travel behavior would be with various forms of transportation facilities available. It makes no allowance for the roads or transit facilities that exist, other than to extrapolate in some cases their current use to forecast future travel behavior. Thus, while travel behavior is dependent on travel demand, it is not only the demand for travel that influences final travel behavior outcomes but also the supply of transportation facilities. Travel behavior can also be very variable in its nature, and virtually all analysis has to be taken as contextual and in particular is related to the physical facilities available and the flexibility the traveler enjoys. The timing of trips is far more important for commutes, for example, than for daily leisure activities, although over a longer period, the constraints of having set periods for vacations can be

170

K. Button

very restrictive in terms of vacation travel. Equally, travel at peak times of the day, the “rush hour,” or at common vacation times, such as Christmas, can put pressures on the existing transportation infrastructure, with those able trying to avoid the worst excesses of congestion. But the very transport demands of people also affect the de facto supply of capacity available for their individual use. The classic example is that economies of agglomeration that, by concentrating employment, tend to push up the demand for urban transport infrastructure use in the morning and evening rush hours, leading to congestion and the increasing of the prices of these trips. This “club-good” problem, akin to golfers preferring to tee off in the early morning, puts pressure on the transport system leading to high levels of peak period congestion, a phenomenon that has attracted a lot of attention among urban and transportation scientists (Lindsey and Verhoef 2008). The information traditionally used for analyzing travel behavior and demand has been of the revealed preference form, and the tools of analysis have been Gaussian in orientation. Essentially this has involved the extrapolation of previous travel demand behavior into the future based on statistical analysis of prior relationships between travel and physical, economic, and sociological influences. The relationships are assumed to be constant over time. The stochastic nature of the underlying historical relationship allows confidence intervals to be drawn around the projections. The traditional modeling approach to handling this information to, for example, forecast the implications on travel behavior of a policy shift in road investment was originally limited in part by the need for computational convenience. The models were developed mainly in the 1960s at a time when large-scale transportation and land-use plans were in vogue with an emphasis on providing road access along major commuter corridors, and this influenced the types of forecasts being sought. But these plans required considerable details of traffic effects over large interconnected networks, which in turn necessitate complex manipulating of very large databases. The methodology involved breaking down travel demand analysis into a number of sub-questions, the “four-stage model.” In the urban context, this involves, what is essentially recursive modeling of the aggregate travel in an area, disaggregating this into trips between areas with the city, disaggregating these trip distributions according to the modes used, and finally assigning traffic to individual routes. In simple mathematical terms, the stages can be expressed as T i ¼ f ðX i Þ; T j ¼ f X j  T ij ¼ f T i , T j , C ij T ijm ¼ f T ij , C ijm , C ijm0



(1) (2)



T ijmp ¼ f T ijmp , C ijmp , C ijmp0

(3) 

(4)

where Ti is the number of trips originating in i, Tj is the number of trips destined for j, Tij is trips between i and j, Xi and Xij are the socioeconomic features of i and j, Tijm is the trips between i and j by mode m, Tijmp is the trips between i and j by mode

Travel Behavior and Travel Demand

171

m along route p, Cij is the generalized cost of travel between zones i and j, Cijm is the generalized cost of travel between zones i and j by mode m, Cijmp is the generalized cost of travel between zones i and j by mode m using route p, and the prime notation refers to alternative modes (m0 ) or routes (p0 ). The four stages and their links with inputs, standard feedbacks, and outcomes are seen in Fig. 1. This highly aggregated approach that looks at travel behavior in terms of zonal flows, however, has limited behavioral content. It also suffers from a number of technical weaknesses both in terms of the overall model and of individual sub-models, for example, it is recursive by nature, and although feedback loops are possible to reflect the impacts of traffic assignment on aggregate travel, this tends to be a mechanical rather than a behavioral process. It is also difficult to assess the overall statistical fit of the model; a series of relative small errors in individual components could be compounded over the sequence. Despite these limitations, these four-stage models are still widely used for landuse transportation planning exercises, in part because they are relatively easy to understand and software is abundant, but intellectually they have been superseded by approaches more embedded in economic and social sciences and in particular by discrete choice and activity-based models. Broadly, disaggregate models are characterized by two main features. First, they explicitly recognize that travel decisions emerge out of individuals’ optimizing behavior and, if the final goods consumed as a result of travel are normal, then at a very minimum the demand for travel ought to be related positively to disposable incomes and negatively to the prices of transportation services. Secondly, most have their origins in the “attribute theory of demand.” This approach to human behavior assumes that people desire to maximize a utility function that has, as its arguments, commodity attributes rather than the quantities of the actual goods consumed. In other words, if we represent the amounts of attributes by the vector z, the amounts of commodities (in this case travel alternatives) by the vector x, posit a utility function, U(z), and a production of attribute function, G(x), which reflects the attributes of

Transport system

Trip generation

Trip distribution

Mode choice

Route choice

Equilibrium Activity system

Feedback

Fig. 1 The four-stage model sequence

Flow

172

K. Button

different travel alternatives, and assume that potential travelers are constrained by income, y, and the price of travel, p, then we can reduce the problem to solving maxU ðzÞ subject to z ¼ GðxÞ x  0 px  y

(5)

Because the unit of analysis is the household or individual, the decision to make a particular trip or use a specific mode requires some form of a discrete model specifications. Depending on the nature of the case, these are normally derivatives of logit or probit models;  for estimation purposes, this can generally be expressed in Pi log-odd terms as ln 1P ¼ vðxi Þ, where Pi is the probability of, say, trip i being i made by automobile given the attributes, x, of car travel and alternative modes. The framework can be extended, for example, to situations when there are multiple concurrent choices (multinominal logit model) or when choices are sequential (nested logit models). Theoretically, these types of model have a firm base in economic science and on the idea of random utility developed by McFadden (1974). The models have the practical advantage of requiring relative small data sets for estimation, and there is generic econometric software available to handle discrete choice situations. What they do not do is provide a mechanism for looking at large-scale shifts in travel demand across a large area or for integrating travel behavior with wider changes in activity patterns. While the sequential and disaggregate approaches to transportation demand analysis concentrate on developing sophisticated mathematical simulations of travel behavior, recently there has been a growth of interest in “behavioral realism” and an emphasis on “understanding the phenomenon.” Sometimes called the activity-based approach because it has sought to embrace a richer, holistic framework in which travel behavior is analyzed as a daily or multiday pattern of behavior related to lifestyles and participation in various activities. The idea is that the demand for travel is derived from the activity patterns of individuals and households. The basic idea has much to do with the concept of time geography dating back to Hägerstrand (1969) in which an individual’s choice of a specific activity pattern is viewed as being the solution to an allocation problem involving limited resources of time and space. In this sense, simply focusing on actual behavior is not that useful, but rather there is a need to put more emphasis on the constraints that limit people’s behavior, and these are often more difficult to define and measure. These approaches to modeling have also been tied in with the greater use of stated preference techniques that question people about their probable travel reactions to say a change in gasoline prices or the introduction of a new public transportation service, rather than consider the revealed preferences of people to similar changes in the past. Stated preference methods themselves can be applied to most forms of behavioral analysis, including narrower trip-based work, and has been more strictly defined by Kroes and Sheldon’s (1988) as “a family of techniques which use

Travel Behavior and Travel Demand

173

individual respondents’ statements about their preferences in a set of travel options to estimate utility functions,” and is often claimed to be helpful when: There is insufficient variation in revealed preference data to examine all variables of interest. There is a high level of correlation between the explanatory variables in a revealed preference model making statistical estimation of parameters difficult. A radically new technology or policy takes the analysis outside of the realms where current revealed behavior is relevant. Variables are not easily expressed in standard units (e.g., when the interest is in the effects on demand of less-turbulent travel by air). The aim of activity-based modeling, a specific form of interactive modeling, is to develop models that get closer to the essential decision process underlying travel behavior, and it is for this reason that stated preference techniques have generally been favored in this type of work. Rather than simply incorporate variables such as household status in mathematical models because the statistical “explanation” of the model appears to be improved, activity modeling seeks to explain why status affects travel behavior. Theoretically, travel is seen as one of a whole range of complementary and competitive activities operating in a sequence of events in time and space. It is seen to represent the method by which people trade time to move location in order to enjoy successive activities. Generally, time and space constraints are thought to limit the choices of activities open to individuals. The technique is still far from fully developed, but it began to be applied in a relatively limited number of small-scale forecasting studies from the late 1970s. The emphasis of activity-based models is upon the household (or individual) as the decision-making unit. It focuses on the revealed pattern of behavior represented by travel and activities, both in home and nonhome over a defined period of time. It thus generally makes use of revealed preferences but can combine this with stated preferences analyses for forecasting the impacts of changes in activity options. According to Heggie (1978), ideally, an activity-based model should exhibit six main properties: It should involve the entire household and allow for interaction between its members. It should make existing constraints on household behavior quite explicit. It should start from the household’s existing pattern of behavior. It should work by confronting the household with realistic changes in its travel environment and allowing it to respond realistically. It should allow for the influence of long-term adaptation. It should be able to tell the investigator something fundamental that he did not know before. In general, the approach is typified by a fairly small sample and careful survey techniques, often involving such things as “board games” – such as the “household

174

K. Button

activities travel simulator” (HATS) developed by the Oxford University Transportation Studies Unit (Jones 1978) – or other visual aids, frequently computer based these days, to permit households to appreciate the full implications of changes in transportation policy for their own behavior. In a sense, it represents an attempt to conduct laboratory experiments by eliciting responses in the context of known information and constraints. The early HATS approach was to confront a household with a map of the local area together with a 24-h “strip representation of colored pieces” showing how current activities of the household are spread over space and throughout the day. Changes to the transportation system were then postulated and the effects on the household’s activities throughout the day were simulated by adjustments to the strip representation. In this way, changes in the transportation system could be seen to influence the 24-h life pattern of the household, and apparently unsuspected changes in “remote” trip-making behavior can be traced back to the primary change. It makes clear the constraints and linkages that may affect activity and transportation choices. More recent studies have adopted rather more sophisticated experimentation procedures, often involving computers, which provide for greater flexibility and easier interaction with those being “interviewed” – for a survey, see McNally and Rindt (2008). Examples of programs of this genre include ALBATROS (A LearningBased Transportation Oriented Simulation System), the first computational process model of the complete activity scheduling process that could be fully estimated from data, and TRANSIMS, an attempt to replace the entire traditional travel paradigm, has an activity-based front end linked to a population synthesizer, and integrated with a micro-simulation of modeled travel behavior. Computer-based models include STARCHILD and AMOS with mathematical programming models such as HAPP. The development of geographical information and global positional systems has allowed for better data availability and real-time surveillance of travel and associated behavior. While this aspect of the approach has been refined, important technical issues remain regarding using the information gathered from stated preference type experiments for forecasting. There is still, for example, much to be learned about why some households give strategically biased responses; in particular there are difficulties in handling habit, inertia, and hysteresis in an experimental framework. At a more technical level, John Bates (1988) points to our lack of knowledge about the error structures associated with stated preference data and the particular problems of pooling data across individuals. In contrast to the more traditional revealed preference schools, advocates of this approach, however, point to both the specific recognition that travel is a derived demand and the fact that transportation policies have qualitative, as well as quantitative, effects on people’s lives. In the longer term, when operational models are more fully developed, the framework may offer the much-sought-after basis for integrating land-use and transportation planning assessment. In the short term, the approach has offered useful insights and a method for cross-checking the validity of conventional statistical analysis of behavioral data.

Travel Behavior and Travel Demand

4

175

The Elasticity of Travel Demand

In many situations, the focal point for travel demand analysis is very specific, relating to a particular issue or policy question such as the impact of a fare rise on transit ridership or the implications for travelers of a new emissions charge. While larger models are often used to assess the general effects of these types of change, the estimation of the appropriate demand elasticity provides guidance to the specific effects on a target variable. There are an entire range of possible elasticities that may be considered (Oum et al. 2008), but all have the common feature of measuring the proportional change in a behavior relating to a target transportation variable (bus trips, fuel use, carpooling, or whatever) of a proportional change in the instrumental variable (a new tax, a new vehicle design regulation, the reduction of a speed limit, or whatever). The estimation of basic elasticities is conceptually fairly straightforward (at its simplest it is eΔ = {δQ/Q}/{δP/P} where Q is the “quantity” of travel measured in some way and P is the “price”) and spelled out in introductory microeconomics texts. For example, if we take a travel demand function, say, for bus travel, of the log-linear form lnQM = α + β1lnPM + β2lnY + β3lnPN, where, QM is the quantity of bus services demanded, PM is the fare for bus travel M, Y is the income, and PN is the price of an alternative, say, car travel, then N, the fare elasticity is parameter β1 with the income elasticity of demand, reflecting the sensitivity of the quantity demanded to income changes being β2 and the cross-elasticity of demand regarding the cost of car travel being β3. Here we focus on some of the more transportation specific nuances and also offer some discussion of the empirical values that have been derived. Generalizations about the size of elasticities are difficult, especially across all modes of transportation, but in many cases it seems clear that price changes within certain limits have relatively little effect on the quantity of travel or transportation services demanded. Further, while demand elasticities often exhibit a degree of stability over time, they do not remain constant in part because of shifts in the demand function due to such things as rising incomes or changes in consumer tastes. Studies of urban public transportation in the 1970s, for example, covering a variety of countries indicate relatively low price elasticities with a direct fare elasticity of around 0.3 being considered normal, but emerged as somewhat higher in the 1980s. The effect of price change on private car travel must be divided between the effect on vehicle ownership and that specifically on vehicle use. Most early United Kingdom studies of car ownership, for example, indicated an elasticity of about 0.3 with respect to vehicle price and 0.1 with respect to gasoline price, but empirical work suggests a rather higher sensitivity in the United States: 0.88 purchase price and 0.82 fuel price elasticities. The generally low fuel price elasticity for car use in the short term is attributable to changing patterns of household expenditure between vehicle ownership and use and people’s perception of motoring costs. Bendtsen (1980) brought early findings together in an international comparison that found the petrol price elasticity of demand for car use to

176

K. Button

be 0.08 in Australia for the period from 1955 to 1976, 0.07 in Britain for 1973/ 1974, 0.08 in Denmark for 1973/1974 and 0.12 for 1979/1980, and 0.05 in the United States for the period from 1968 to 1975. Oum et al. (1992) found a slightly greater degree of sensitivity when looking at seven studies covering the United Kingdom, the United States, and Australia; they yielded car usage elasticities in the range 0.09 to 0.52. Table 1 provides a survey of some estimates of automobile and public transportation elasticities. The market demand elasticities reflect the impacts on total mileage of a price change, while the mode choice elasticities refer to the probabilities of using a mode as its price varies. We immediately note that the former combines mode shift and distances per trip and thus tend to be greater than mode choice elasticities. Further, we see a wide range of elasticities emerging dependent on the nature of the Table 1 Price elasticities of demand for passenger transportation expressed in absolute values Mode Air Vacation Non-vacation Mixeda Rail: intercity Leisure Business Mixeda Rail: intracity Peak Off peak All daya Automobile Peak Off peak All daya Bus Peak Off peak All daya Transit system Peakc Off peak All daya a

Range surveyed Market demand elasticities

Mode choice elasticities

Number of studies

0.40–4.60 0.08–4.18 0.44–4.51

0.38 0.18 0.26–5.26

8 6 14

1.40 0.70 0.11–1.54

1.20 0.57 0.86–1.14

2 2 8

0.15 1.00 0.12–1.80

0.22–0.25 n.a. 0.08–0.75

2 1 4

0.12–0.49 0.06–0.88 0.00–0.52

0.02–2.69 0.16–0.96 0.01–1.26

9 6 7

0.05–0.40b 1.08–1.54 0.10–1.62

0.04–0.58 0.01–0.69 0.03–0.70

7 3 11

0.00–0.32 0.32–1.00 0.01–0.96

0.1 n.a. n.a.

5 3 15

The distinction between vacation and non-vacation routes tends to be arbitrary and may explain the wide range of results b Includes studies that do not distinguish between categories such as peak, off-peak, or daily c Based solely on Hensher (2006) Source: Oum et al. (2008)

Travel Behavior and Travel Demand

177

trip, the mode used, and the time of day the travel takes place. In particular, when the timing and need for trips is very rigid, such as journeys to work during peak periods, the elasticities tend to be quite low. Leisure travel, where there is often more flexibility, is more sensitive to travel costs. There is, in addition to the data in the table, an abundance of evidence that the fare elasticity for certain types of public transport trips is much higher than others. Business travel demand in particular seems to be relatively more insensitive to changes in transportation price than other forms of trip. The pioneering work of Kraft and Domencich (1970) found that public transportation work trips exhibited a fare elasticity of 0.17 in Boston compared with 0.32 for shopping trips. A similar pattern is found for business and nonbusiness air travel with the latter being generally higher. This is a pattern also seen, as one might expect, in terms of the type of fare being paid. Straszheim (1978) found some time ago that, “First class fares can be raised and will increase revenue. . . The {price elasticity of} demand for standard economy service is about unity, and highest for peak period travel . . . The demand for discount and promotional fares is highly price elastic. . ..” This conforms to the intuition that vacation travelers have more flexibility in their actions (destinations, times of flights), whereas business trips often have to be taken at short notice. The lower sensitivity associated with first-/business class fares also reflects the service requirements of users who often seek room to work on planes and lounges. The estimates of the elasticities are also sensitive to the length of service with shorter routes generally exhibiting higher fare elasticities, in part because other modes of transportation become viable options. Users of different forms of transportation, or different services of the same mode, are often confronted with a variety of payment options; their perceptions of the price of a journey may differ from the actual monies expended. Motorists generally perceive very little of the true overall price of these trips because they base decisions on a limited concept of short-term marginal cost – for example, they only buy fuel periodically and do not take this into account when deciding whether to make a particular trip – whereas users of public transportation have to buy tickets before traveling making them more aware of the costs of their behavior. Nevertheless, because of the range of season tickets that permit bulk buying of journeys over a specific route, and travel card facilities that permit bulk buying of journeys over a specified network, the distinction is not a firm one. As with other purchasing decisions, people confronted with a change in transportation price generally act differently in the ultrashort run, the “market period,” the short run, and the long run – Table 1 offered some examples. Immediate reactions, in the ultrashort term, to a public transportation fare rise may, for example, be dramatic, with people, almost on principle, making far less use of services, even boycotting it, but knee-jerk reaction is extremely short lived and seldom considered by economists, although it is often of interest to politicians. This behavior is usually short lived. In the slightly longer “market period,” people revert to their initial behavior relatively unresponsive to a price change either because they do not consider it a permanent change or because technical constraints limit their immediate actions; the elasticity is virtually zero. Over time, people can adjust their behavior and, in the

178

K. Button

short run can change their travel patterns by switching modes, combining trips, and cutting out some travel, and businesses can reschedule the use of their vehicle fleets and modify their collection and delivery patterns. The demand for cars travel, therefore, becomes more elastic in relation to the new price. In the long term, people can change the type of car they use and their employment and residential locations, and industry can modify their entire supply chain. Taking a specific context, when considering the effect of general rises on commuter travel costs, the necessity of having to make journeys to work is likely to result in minimal changes in travel patterns in the short term, but over a longer period, relocations of either residence or employment may produce a more dramatic effect. This implies that one must take care when assessing elasticity coefficients and it is useful to remember that crosssectional studies tend to offer estimates of long-run elasticity while time-series studies reflect short-term responses. Elasticities are also generally found to increase the longer the journey under consideration. This is not simply a function of distance but rather a reflection of the absolute magnitude of, say, a 10% rise on a $5 fare compared with that on a $500 fare. It is also true that longer journeys are made less frequently, and thus, people gather information about prices in a different way. Additionally, they often tend to involve leisure rather than business travel; this suggests that distance may be picking up variations in trip purpose. In the air transportation market, for example, DeVany (1974) found in a classic study that price elasticity rose from 0.97 for a 440-mile trip in the United States to 1.13 for an 830-mile trip. Turning to the effects of income on travel behavior, while there is ample evidence that travel is a normal good in the sense that more is demanded at higher levels of income, this generalization does not apply to all modes of travel or to all situations. At the national level, income exerts a positive influence over car use, but this is not so clear-cut with public transportation use, and in some cases the latter becomes an “inferior good” with its use falling after some level of income has been attained. Gwilliam and Mackie (1975), for example, suggest that the long-run elasticity of demand with respect to income was of the order 0.4 to 1.0 for urban public transportation trip making in the United Kingdom. They argue that although car ownership rises with income, and hence some trips are diverted from public transportation, there is still a limited offsetting effect inasmuch as wealthier households make more trips in total. This effect would seem to be less relevant today with much higher levels of automobile ownership in developed countries, a fact borne out in the findings of Crôtte et al. (2009) in the context of Mexico. The income elasticity of demand for many other modes of transportation is seen to be relatively high, and especially so for modes such as air transport. Taplin (1980), for example, suggests a figure of the order of 2.1 for vacation air trips overseas from Australia. By its nature, air travel is a high-cost activity with the absolute costs involved being high even where mileage rates are low so that income elasticities of this level are to be expected. There is also some evidence that wealth influences the demand for air travel, with Alperovich and Machnes’s (1994) study of the Israeli market founding a wealth elasticity of 2.06.

Travel Behavior and Travel Demand

179

As with price, income changes exert somewhat different pressures on travel behavior in the long run compared with the short. In general, a fall in income produces a relatively dramatic fall in the level of demand, but as people readjust their expenditure patterns in the long term, the elasticity is likely to be much lower. Reza and Spiro (1979), for example, produce an estimate of 0.6 for the short-run income elasticity of demand for petrol rising to 1.44 in the long run. If one assumes that gasoline consumption is a proxy for trip making, then one could attempt to justify this in terms of a slow reaction to changing financial circumstances a reluctance, for example, to accept immediately the consequences of a fall in income. In fact, the situation is likely to be more complex because the long run may embrace changes in technology, and possibly locations, that alter the fuel consumption-tripmaking relationship. Thus, these figures may still be consistent with the initial hypothesis regarding the relative size of short- and long-run income elasticities of demand for travel. The demand for any particular travel service is likely to be influenced by the actions of competitive and complementary suppliers – the cross-elasticity effect. Strictly speaking, it is also influenced by prices in all other markets in the economy that touch upon the importance of motoring costs vis-á-vis the demand for public transportation services. There are wide variations in results that generally reflect the adoption of alternative estimation procedures and time-lag allowances, as well as the peculiarities of the local travel situation. One of the more interesting points is the almost total insensitivity of the demand for urban car use to the fare levels of both bus and rail public transportation modes. This fact, which has been observed in virtually all studies of urban public transportation, is the main reason that attempts by city transportation authorities to reduce or contain car travel by subsiding public transportation fares have, in the main, proved unsuccessful. Transportation demand is also sensitive to the quality of service offered, although measurement of relevant elasticities is challenging because many service attributes are essentially intangible and because of this are treated more as service qualitative. It is noticeable, for example, from empirical studies that public transportation demand is sensitive to changes in service quality attributes such as reductions in speed or frequency of services; other attributes that are often intuitively seen as important are not easily isolated. Lago et al. (1981) examined a wide range of international studies concerned with urban public transportation service elasticities and concluded that increased service levels do not generate proportional increases in passengers and revenues, but their analysis looked at service quality attributes in isolation rather than at a package of service features and missed many qualitative attributes. The survey also highlights that service quality is far more important when the initial level of service is poor; the general elasticities found for peak period ridership, for instance, are much lower than those for the off peak. Further, it found that the service headway is one of the more important service variables; the studies examined indicate an elasticity of 0.42 compared with 0.29 for in-vehicle bus travel time.

180

5

K. Button

Using Travel Behavior and Travel Demand Information

Now we deviate a little from what is often contained in previous handbooks that are concerned with the “science” of analyzing travel behavior, and spend some time on more practical matters of how research on travel behavior and demand is often applied. This is very much in the realm of regional political science, but germane to the context in which more positive aspects of the science of travel behavior and demand are treated. The implicit view of Milton Friedman (1953) on model performance can be recalled here, “The ultimate goal of a positive science is the development of a ‘theory’ or ‘hypothesis’ that yields valid and meaningful (i.e., not truistic) predictions about phenomena not yet observed.” Simply producing a model of travel behavior that tells stories about the past is unlikely to be either useful or good science. While there are numerous studies and texts on academic modeling of travel behavior in reality, the importance of this for regional science is just how useful the analysis is for forecasting. While there may be a historic interest in knowing why particular land-use patterns have emerged or why particular local industries have thrived, the main rationale for studying travel behavior and travel demand is to be able to use the information gleaned for policy development. This means understanding not only why the current demand patterns exist but also how they may change in the future. Most existing models can explain ongoing trends fairly well, and the importance of trend breaks to some extent but are poor at explaining new trends. The advent of real-time information systems and mobile communication platforms has, for example, added to the way people view trip making and the speed at which they change their travel plans. The switch to large-scale service sector employment is changing the perceptions of working hours. These were developments not foreseen 20 or so years ago when much of the thinking surrounding the then current travel behavior and demand models was emerging. One of the main purposes of trying to get a handle on travel behavior and demand is to assist in policy-making, both in the public and the private sector. While there is a plethora of academic interest in trying to improve our understanding of why people travel and the nature of their trips when they do go into motion, there is somewhat less study of travel behavior by the private sector that provides hardware such as rail track and automobiles and software such as insurance or those that cater for some of the side effects of travel, such as medical services. Just reflecting on the last of these, changes in modes of road travel between public and private means affect the types of injuries associated with accidents and the incidents of ailments such as severe asthma. In terms of the automobile industry, the differing driving patterns of various age and income groups affect the demands for their models, as do social attitudes toward various modes of transport. Aside from those directly involved in transportation, there are others with an interest in travel behavior, not least of which are the fiscal authorities. Transportation is both a large generator of taxation revenue and a major sump hole for subsidies. There are also matters of the demands of transportation users such as the military that are seldom considered within conventional academic modeling or at least the material that appears in the public domain.

Travel Behavior and Travel Demand

181

Each group has an interest in a particular aspect of travel behavior. Transportation and land-use planners often want a longer view to assess the implications of fixed, often multimodal, infrastructure investments and their interactions with economic and social development. The car industry is focused just on forecasts that lead to the commercial success of its products, and this involves a somewhat shorter time horizon. Most tax authorities do not often go beyond the myopia of trying to balance this year’s books. Following the greater academic herd, we focus on the types of demand analysis used in land-use transportation planning and policy-making. Here hard numbers are generally concerning the future use of links and nodes in the relevant transportation system. These are needed in particular for engineering design purposes and in the estimation of costing to ensure appropriate funding is available. The situation is not a very satisfactory one, and despite the efforts of analysts, the poor quality of transportation forecasts used in the field has been known for some time. The 1970s saw considerable debate in the United Kingdom, for example, over inaccuracies in the forecasting of car ownership, a major input into traffic modeling, and of the traffic forecasts themselves in the 1980s. The forecasts for the M25 London orbital road, for instance, were that, on 21 of the 26 three-lane sections, the traffic flow would be between 50,000 and 79,000 vehicles a day in the 15th year of operation, whereas the flow within a very short time was between 81,400 and 129,000. In the 1990s, a series of studies in the United States, including those by the likes of Kain (1990) and Pickrell (1992) brought into question the forecasts of transit ridership and financial costs of investments. Pickrell’s study of programs funded by the United States Urban Mass Transportation Administration, for example, found that the ten projects examined produced major underestimates of costs per passenger (e.g., the costs for the Miami heavy rail transit were 872% of those forecast, and for Detroit’s downtown people mover, they were 795%); only the Washington heavy rail transit project experienced actual patronage more than half of that forecast. Updating of this work and looking at cost and ridership forecast for 47 United States transit systems indicates only limited improvements over time (Button et al. 2010) when allowance is made for the composition of projects (e.g., light and heavy rail) and for whether an investment was in a strictly new system or an extension of an existing one. A series of studies in the early 2000s by Flyvbjerg et al. (2004, 2005) looking across a range of surface transportation forecasts have shown considerable inaccuracies extending to most western economies and provide confirmation of the poor performance of forecasts. There emerges, in particular, a tendency for overprediction of capacity utilization and underprediction of the outcome costs of investments – for example, for ten rail projects from a variety of countries, the passenger forecasts overestimated traffic by 106%, whereas for road projects, there is a tendency for the forecasts to be wrong by about 20% but with the errors spread equally around the ultimate flows. In terms of costs, an examination of 58 rail projects indicates overruns averaging nearly 45% and for 167 road investments, overruns of 20.4%. It is not just the public sector decision-making per se that is often based on poor traffic forecasts. An American study by JP Morgan (1997) of 14 privately financed

182

K. Button

and operated toll roads found that only one exceeded the projected return, with four projects over estimating forecasting returns by at least 30%. The overall conclusion being “reducing the uncertainty associated with these forecasts represent one of the major challenges for transportation agencies, traffic consultants, investment bankers and investors.” In a study of over 100 international, privately financed road project appraisals conducted between 2002 and 2005, Bain (2009) concludes that “. . .in terms of error, the predictive accuracy of traffic models – used for toll road or toll free road forecasts – is poor.” Again there emerges a proclivity to overestimate traffic flows, with the ratio of actual to forecast traffic falling below unity for the majority of studies, although in some cases the predictions underestimated by up to 51%. The accuracy of forecasts does not appear to have improved over time, although, of course, that is not to say that some have not proved to be very reliable. Travel behavior and demand forecasts are a major factor in both the decisions to undertake investments and in their design; serious errors in foreseeing changing patterns of travel can thus result in misuses of resources across modes and probably between travel and other activities in the economies concerned. The problem may well, in fact, be considerably worse than the quantifications by Pickrell, Flyvbjerg, and others cited earlier, in that there is no way of knowing if forecasts of investments that were rejected were biased and instrumental in the rejections. If so, then resources that would have earned a social return in those projects would have been transferred to some other use where the net benefits are less. We only have data on the investments that actually materialized and relating to policies that have actually been adopted. The difficulties that emerge, however, are only partly a function of strict forecasting errors; there are three broad and entwined reasons why travel behavior and demand forecasts are generally poor: technical problems of the type we have discussed, carrying out the forecast, and using the forecast. The last of these is very much in line with the public choice theory of economic science that focuses on rent seeking and coalitions of interest. Thus while we have tended to couch the discussion as normative issue, it could be reexpressed in positive scientific terms along the lines of Friedman. Each facet of the problem however is not constant, is often entwined with another, and can vary with circumstances. Regarding the first, there is a widespread proclivity to look at the technical merits of the models being used, rather than their practical use as a forecasting instrument; elegance, sophistication, and the ability to backcast are often seen as criteria for a good forecasting model, whereas in many cases, forecasting exogenous variables is more difficult than predicting traffic flows however well the model fits historical data. There are, however, clearly challenges in the collection of the data needed to calibrate the parameters of the models needed to produce forecasts and in predicting even the short-term future path of many explanatory variables. Most forecasts rely on extrapolations of previous behavior, but divergences from historical relationships do occur and are often seen in wider institutional changes reflecting social priorities and attitudes to things like the environmental impacts of transportation, its safety, and its security. The use of such techniques as sensitivity analysis and simulations can provide some insights, but the range of

Travel Behavior and Travel Demand

183

possibilities, and thus the potential for error, increases as the length of the forecast period gets longer. Availability of data is always a problem for forecasting travel behavior, and the situation may be getting worse. There are certainly better survey design techniques and more ways to gather travel-related data ranging from online surveys to the use of global positioning systems for tracking movement. There is also evidence of improvements, although some may disagree, in our ability to produce reasonable medium- and long-term forecasting of such things as income, demographics, and fuel prices that feed into travel behavior projections. Against this, the move toward lighter regulation and privatization of travel facilities has changed and often reduced the public sources of data, and especially economic data, available to forecasters to carry out their work. Despite these largely technical and operational challenges, the problems in forecasting accuracy would often seem to lie more in the way forecasting is actually done and the way results are used. A problem initially highlighted by Kain (1990) is that forecasts and travel behavior analysis are not politically neutral, and many decisions regarding transportation investments and policy are not made in the public interest, but to some degree serve the ends of those who are making or using them. Basically, the forecasting process and output can be seen as captured by those who commission the forecasts and then make use of them. These may be politicians who wish to win reelection and thus positively assess the short-term gains of supporting high-use/low-cost investments provided by some forecasts against other less “optimistic” projections, or they may be bureaucrats concerned with increasing public sector activities. Under some forms of government, and particularly federal systems, “pork-barrel politics” can incentivize the use of biased forecasts by local agents to gain central funding: a principal-agent issue. The system is, in strict terms, corrupt and even if forecasters are neutral in their work, this has little influence on the way their output is interpreted and used. Thus, to try and close the gap between forecast and actual outcomes is often seen as a matter of institutional reform, rather than better science (Wachs 1990). The forecasting problem should not be seen as being unique to the public sector; there are similar institutional issues when considering some forms of private sector forecasts, especially when they involve concessions. The incentive for those tendering for such things as build-operate-transfer projects, according to Bain, is to be optimistic with forecasts so as to win the contract and gain financial support. While the evidence for this practice is largely anecdotal in the case of tolled roads, there is more support in the context of airport concessions where ex post renegotiations are relatively common when traffic flows fall short of forecasts.

6

Conclusions

From an economic perspective, transportation is not at all special; when deciding whether to use it, people compare its relative generalized cost against the perceived potential benefits. As with most things, people are often disappointed because the

184

K. Button

benefits of the trip do not live up to expectation or the costs include elements they had not foreseen. From the point of view of assessing the impact of policies that change travel behavior on regional economic performance or local social conditions, it is the perceptions rather than the actuality, however, that are important. Modeling of this, despite the veneer of simple mathematics that is often used as the language of debate, is far from easy, especially in terms of providing good forecasts of travel behavior to feed into regional analysis. The situation becomes even more complicated in practice because of the ways forecasts are used is not neutral but reflects a larger “political” process that itself needs to be modeled. The progress that has been made in terms of the pure technique of travel behavior and demand modeling has, however, been significant in terms of moving away from what was essentially seen as a mechanical, engineering process to one that embodies an acceptance that human behavior is more complex and less systematic than the early analyses assumed. It has also become more integrated into larger regional and urban modeling, with interactive relationships largely replacing the idea of a recursive structure with land-use characteristics having a one-way influence on travel behavior, activity analysis being the most important element in this. The future of demand analysis is already being made easier with the availability of big data sets and information that is available on an almost real-time basis. But new challenges are also emerging as innovative transportation options, such as appbased ride-hailing in the form of Uber and the like become available, and as driverless vehicles have become a reality.

References Alperovich G, Machnes Y (1994) The role of wealth in demand for international air travel. J Transp Econ Policy 28:163–173 Bain R (2009) Error and optimism bias in toll road traffic forecasts. Transportation 36:469–482 Bates J (1988) Econometric issues in stated preference analysis. J Transp Econ Policy 22:59–70 Bendtsen PR (1980) The influence of price of petrol and of cars on the amount of automobile traffic. Int J Trans Econ 7:207–213 Button KJ, Doh S, Hardy MH, Yuan Y, Zhou X (2010) The accuracy of transit system ridership forecasts and capital cost estimates. Int J Trans Econ 37:155–168 Crôtte A, Noland RB, Graham DJ (2009) Is the Mexico City metro an inferior good? Transp Policy 16:40–45 DeVany AS (1974) The revealed value of time in air travel. Rev Econ Stat 56:77–82 Flyvbjerg B, Holm M, Buhl SL (2004) What courses cost overruns in transport infrastructure projects? Transp Rev 24:3–18 Flyvbjerg B, Holm M, Buhl SL (2005) How (in)accurate are demand forecasts in public works projects? The case of transportation. J Am Plan Assoc 71:131–146 Friedman M (1953) The methodology of positive economics. In: Friedman M (ed) Essays in positive economics. University of Chicago Press, Chicago Gwilliam KM, Mackie PJ (1975) Economics and transportation policy. Allen and Unwin, London Hägerstrand T (1969) What about people in regions? Pap Reg Sci Assoc 24:7–21 Heggie I (1978) Putting behaviour into behavioural models of travel choice. J Oper Res Soc 29:541–550

Travel Behavior and Travel Demand

185

Hensher DH (2006) “Bus Fares Elasticities,” a working paper. Institute of Transport and Logistics Studies, The University of Sydney Jones PM (1978) School hour revisions in West Oxfordshire: an exploratory study using HATS. Technical Report, Oxford University Transport Studies Unit Kain J (1990) Deception in Dallas: strategic misrepresentation in rail transit promotion and evaluation. J Am Plan Assoc 56:184–196 Kraft K, Domencich TA (1970) Free transit. Heath, Lexington Kroes EP, Sheldon RJ (1988) Stated preference methods: an introduction. J Trans Econ Policy 22:11–26 Lago AM, Mayworm P, McEnroe JM (1981) Transit service elasticities – evidence from demonstration and demand models. J Trans Econ Policy 15:99–119 Lindsey R, Verhoef E (2008) Congestion modelling. In: Hensher DA, Button KJ (eds) Handbook of transportation modelling, 2nd edn. Elsevier, Amsterdam McFadden D (1974) Conditional logit analysis of qualitative choice behaviour. In: Zarembka P (ed) Frontiers in econometrics. Academic, New York McNally MG, Rindt CR (2008) The activity-based approach. In: Hensher DA, Button KJ (eds) Handbook of transportation modelling, 2nd edn. Elsevier, Amsterdam Morgan JP (1997) Examining toll road feasibility studies. Munic Financ J 18:1–12 Oum TH, Waters WG, Yong JS (1992) Concepts of price elasticities of transportation demand and recent empirical evidence. J Trans Econ Policy 26:139–154 Oum TH, Waters WG, Fu X (2008) Transportation demand elasticities. In: Hensher DA, Button KJ (eds) Handbook of transportation modelling, 2nd edn. Elsevier, Amsterdam Pickrell DH (1992) A desire named streetcar: fantasy and fact in rail transit planning. Am Plan Assoc J Am Plan Assoc 58:158–176 Reza AM, Spiro HM (1979) The demand for passenger car transportation services and for gasoline. J Trans Econ Policy 13:304–319 Straszheim MR (1978) Airline demand functions on the North Atlantic and their pricing implications. J Trans Econ Policy 12:179–195 Taplin JHE (1980) A coherence approach to estimates of price elasticities in the vacation travel market. J Trans Econ Policy 14:19–35 Wachs M (1990) Ethics and advocacy in foresting for public policy. Bus Prof Ethics J 4:141–157

Activity-Based Analysis Harvey J. Miller

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Conceptual Foundations of Activity-Based Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Policy and Technology Context for Activity-Based Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Activity Data Collection and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Activity-Based Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Frontiers in Activity-Based Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

188 188 191 193 199 203 205 205

Abstract

Activity-based analysis (ABA) is an approach to understanding transportation, communication, urban, and related social and physical systems using individual actions in space and time as the basis. Although the conceptual foundations, theory, and methodology have a long tradition, until recently an aggregate trip-based approach dominated transportation science and planning. Changes in the business and policy environment for transportation and the increasingly availability of disaggregate mobility data have led to ABA emerging as the dominant approach. This chapter reviews the ABA conceptual foundations and methodologies. ABA techniques include data-driven methods that analyze mobility data directly as well as develop inputs for ABA modeling. ABA models include econometric models, rule-based models and microsimulation/agentbased models. This chapter concludes by identifying major research frontiers in ABA.

H. J. Miller (*) Department of Geography, The Ohio State University, Columbus, OH, USA e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_106

187

188

H. J. Miller

Keywords

Communication behavior · Time path · Mobile object · Microsimulation model · Time geography

1

Introduction

Activity-based analysis (ABA) refers to treating individual actions in space and time as a basis for understanding human mobility and communication behavior and related systems such as cities, economies, and the physical environment. ABA is replacing aggregate trip-based approaches as the basis for forecasting and knowledge construction in transportation science and urban planning. ABA has long recognized advantages over trip-based approaches, not the least being theoretical validity. In addition, ABA can capture complex constraints and linkages that determine mobility better than aggregate, trip-based approaches. ABA also admits a wider range of policy variables, including non-transportation solutions to mobility problems. Until recently, the data and computers did not exist to apply ABA to realistic scenarios. These limits have been shattered by increasingly powerful computers but especially by individual-level data available through wireless location-aware technologies embedded in infrastructure, attached to vehicles, and carried by people. These data are enhancing activity data analysis and modeling techniques. They are also leading to a new, data-driven approach to ABA based on exploratory analysis and visualization methods. The next section of this chapter discusses the conceptual and practical foundations of ABA. It first reviews the traditional, trip-based approach and identifies key weaknesses. The activity-based approach resolves some of these weaknesses by treating mobility and communication not as disembodied flow but as humans conducting the activities that comprise their lives. The Sect. 3 reviews policy and technological changes that are leading to advances and wider application of the ABA approach. Section 4 reviews data collection and data analysis methods for ABA. The Sect. 5 discusses activity-based models of travel patterns and urban dynamics using econometric, rule-based, and simulation methods. Section 6 identifies ABA research frontiers.

2

Conceptual Foundations of Activity-Based Analysis

The past century of transportation science was dominated by a trip-based approach to understanding and predicting human mobility. This approach focuses on isolated acts of mobility as the primary object of study. A trip is a movement of a person, goods, and/or vehicle from an origin to a destination (possibly the same location) motivated by positive factors at the locations (push factors at the origin, pull factors at the destination) and attenuated by negative factors related to the cost of mobility

Activity-Based Analysis

189

between the directed pair. Each trip occurs independently of other activities and trips that occur during individuals’ lives. People, events, and activities are atemporal; time is simply a component of mobility cost. Finally, the trip-based approach treats mobile entities not as unique objects but as undifferentiated flows between areas such as traffic analysis zones, postal units, or census geography (although it can consist of subflows representing different cohorts) (Pinjari and Bhat 2011). Weaknesses of the trip-based approach include (McNally and Rindt 2007): No recognition that mobility derives from activity participation. The treatment of mobility events as resulting from independent and generally unencumbered choice processes, simplifying the complex spatial and temporal constraints that delimit (and sometimes determine) choice. A focus on utility maximization, neglecting alternate heuristics related to factors such as decision complexity and habits. A neglect of the roles played by interpersonal relationships and information in influencing activity, mobility, and communication behavior, including information and communications technologies (ICTs). The activity-based approach focuses on the individual and her or his need to participate in activities that have limited availability in time and space. Mobility is not fundamental but an epiphenomenon: it derives from the need to be physically present for many activities and the “inevitability of distance” between activity locations (Ellegård and Svedin 2012). Telepresence via ICTs can substitute for physical presence but can also complement physical mobility by providing more information about events and opportunities as well as capabilities for interpersonal interaction and coordination. Individual and joint allocation of scarce time is the meaningful starting point to understand activity, travel, and communication at all scales: from the tasks required to fulfill daily projects to the annual and decadal dynamics that affect cities, regions, and the planet (Pred 1977). Strengths of the activity-based approach are (McNally and Rindt 2007): Recognition that mobility derives from activity participation. Explicit treatment of the complex temporal and spatial constraints on activity participation and mobility. Flexibility to accommodate a wide range of decision processes and heuristics. Explicit treatment of social organization, social networks, and ICTs that influence activity and mobility behavior. Table 1 summarizes major components of activity theory. As Table 1 illustrates, mobility – trips or tours – is only a component of a more expansive view of human behavior that includes activity patterns and scheduling as well as the social context that influence these activities. The view that human activities in space and time are the meaningful starting point to understand and manage transportation, cities, and regions dates back to the time-use studies of Chapin (1974) and an influential paper by Jones (1979) that

190

H. J. Miller

Table 1 Elements of activity theory Element Activity

Activity frequency Activity location Activity pattern Activity schedule Time budget

Trip Interaction Tour Mode Activity space

Activity environment Household Social network Lifestyle

Description The main purpose underlying behavior conducted at a specific location and time interval; often classified as fixed versus flexible activities based on the relative ease of rescheduling and relocation The number of times an activity occurs during a given time period Geographic or semantic location where an activity occurs Set of activities to be conducted during a specific time interval Planned sequence and timing of activities to be conducted during a specific time interval Available time for mobility, communication, and activity participation during a given time interval; often expressed relative to flexible activities and constrained by fixed activities Physical movement between activity locations Communication between individuals or locations A multi-stop and often multipurpose trip involving several activity locations Technique or service used to generate mobility and/or communication behavior Geographic region within which a set of activities occur; can be the composite of discrete activity locations or the smallest spatial region or subnetwork that encompasses the activity locations Spatiotemporal configuration of activity locations within a given geographic environment Basic unit of domestic maintenance; influences activity participation through task organization, coordination, and sharing Interpersonal relationships, both formal and informal, that influence activity participation Socioeconomic and demographic factors that influence activity, mobility, and communication behavior

articulated the ABA framework in its contemporary form. But much of the conceptual foundation for ABA was developed by Torsten Hägerstrand in his time geographic framework (Pinjari and Bhat 2011; McNally and Rindt 2007). Time geography underlies many of the core ideas in ABA, including an ecological perspective on human and physical phenomena, the need to build macro-level explanations from the micro-level and situating travel within a larger context, facilitating the recognition of non-transportation solutions to transportation problems. Basic time geographic concepts such as the individual trading time for space in movement among activity locations distributed in time and space may seem trivial since they are so close to everyday life experiences. But this is precisely the point Hägerstrand is making: we neglect seemingly inconsequential but critical factors in our scientific explanations of human behavior; the trip-based approach is an exemplar. Time geography provides a conceptual framework that obligates recognition of basic constraints underlying human existence, as well as an effective notation system for keeping track of these existential facts (Ellegård and Svedin 2012).

Activity-Based Analysis

3

191

Policy and Technology Context for Activity-Based Analysis

Transportation scientists, engineers, and planners have long recognized the weaknesses of a trip-based approach with respect to validity and accuracy, and the potential of an individual-level, activity-based approach for better understanding and more accurate predictions of transportation and related human–physical systems. However, until recently there has been little incentive for ABA in policy and planning. There was also little capability with respect to data and computing power. The last century has witnessed an unprecedented explosion in human mobility due to the development of technologies and services such as steamships, railroads, private automobiles, and commercial aviation. In today’s world, people travel to a degree that would have seemed magical to our ancestors. While there are obvious benefits from mobility, there is also increasing recognition of its market failures such as congestion, poor air quality, accidents, sprawled cities, obesity, social exclusion, and global warming. High mobility levels are also increasingly under threat from aging infrastructure that is not being sufficiently renewed, increasing urbanization (especially in the Global South), and increasing motorization as newly emergent economies generate rising levels of wealth. It is also increasingly difficult to separate mobility and communication behaviors. The telegraph, telephone, and the Internet have revolutionized communication, but these technologies were tightly coupled with location. The rise of mobile telephony and pervasive computing has liberated telecommunication from specific places, allowing it to be more integrated with people and their activities. This is creating tighter, more complex linkages between mobility and communication. Evidence indicates that the “Death of Distance” argument that geographic location would become irrelevant is naïve: communication complements as well as substitutes for mobility, leading to higher mobility demands at all geographic and temporal scales as well as greater complexity of mobility and activity patterns. Increasing recognition of transportation market failures, threats to mobility, and the tighter integration of mobility and communication behavior have led to new scientific, policy, and planning initiatives in Europe, North America, and increasingly elsewhere. The business and policy environment for transportation policy and planning is evolving beyond simple measures and prescriptions that focus primarily on measuring throughput relative to cost. There is wider consensus that mobility should be managed, not simply maximized. There is also recognition that evaluating transportation performance requires a fuller range of measures including indicators of effectiveness, equity, community livability, and sustainability. Planners have also realized that solving transportation problems requires thinking outside the system to the broader activity and communication patterns that drive complex mobility behavior. This may include non-transportation remedies for transportation problems (e.g., work flextime, different trading and service hours). Approaching policy questions from the ABA perspective starts with underlying activity patterns, their interdependencies, and the potential rebound effects that occur from policy changes. Figures 1a, b provide a simple example (after Ben-Akiva and Bowman 1998). Figure 1a illustrates a daily activity pattern that includes being at

192 Fig. 1 (a) Activity-based approach to policy analysis: before policy intervention. (b) Activity-based approach to policy analysis: after policy intervention

H. J. Miller

a

Grocery 17:45, car, alone

17:00, car, alone

Office

8.15, car, alone

Child care center

8.00, car, child as passenger 18.00, car, child as passenger

Home

b

Grocery

17:30, car, alone Office

18:00, car, alone

Child care center 16.45, bus

18.15, car, child as passenger 7:30, car, alone

7.45, bus

7.15, car, child as passenger Home

home, working, stopping at a day-care center to and from work, and shopping for groceries. Implementing this activity pattern is a single tour from home to the day-care center and work in the morning, shopping in the late afternoon, stopping again at the day-care center and back home in the evening, mostly alone in a private vehicle. Figure 1b illustrates the outcome of a policy intervention: an employer-sponsored public transit incentive combined with higher parking costs. Implementation of the activity pattern now requires three home-based tours: trips to/from childcare center by car in the morning, commuting by bus to and from work during peak times, and shopping by private vehicle in the late afternoon with a stop at the childcare center on the way home. Is this new policy a success? A trip-based approach would likely reach this conclusion since it would focus on the commuting behavior and find a reduction in travel demand by private vehicle. However, an activity-based approach would be more likely to conclude that the new policy was a mixed success due to the shifting of travel and activity patterns and the increase in home-based trips by car. An activity-based approach would capture the linkages between these events and suggest that the transportation policy change should be accompanied by supportive, non-transportation policies such as incentives for day-care centers at work places and/or residential areas. ABA is more challenging than a trip-based approach: the number of sequencing, timing, location, mode, and route choice possibilities for only a daily activity pattern is combinatorial. There are also a large number of household, social network, and informational linkages that determine daily, weekly, monthly, annual, decadal, and lifetime activity patterns. Activity-based comprehensive urban models also consider the

Activity-Based Analysis

193

reactions and dynamics of broader infrastructure, economic, sociodemographic, and political systems. Determining a meaningful boundary around the system being analyzed and the level of resolution for representing different components is critical. This requires judgment that considers the scientific and policy questions being asked, as well as theoretical correctness and consistency (Ben-Akiva and Bowman 1998). With respect to capabilities for ABA, digital data collection, storage, and processing costs have collapsed to an astonishing degree. Location-aware technologies (LATs), digital devices that can report their geographic location densely with respect to time, have become inexpensive and effective. They are increasingly embedded in vehicles and infrastructure and carried by people in consumer products such as smartphones. LATs are generating massive amounts of fine-grained data about mobility and communication dynamics as well as the dynamics of the broader social and environmental systems within which they are embedded. Computers are also much better at handling these data. In addition to dramatic increases in computing power, geographic information systems (GIS) and spatial database management systems (SDBMS) have evolved well beyond their origins in computer-based paper maps to include a wide range of tools managing, querying, analyzing, and visualizing dynamic and moving objects data. Social media available through mobile communication devices allow users to obtain better information transportation systems, share user-generated content, and even participate in management and governance. New interdisciplinary fields such as computational transportation science are emerging to exploit data collection, processing, and communication capabilities to solve vexing and increasingly critical transportation challenges. Private sector companies such as IBM envision smarter transportation, smarter cities, and a smarter and more sustainable planet by collecting fine-grained sensor data, processing these data into meaningful metrics, and sharing this information widely to support more collaborative decision-making. There are critical privacy questions that must be resolved (discussed below), but these data and tools have the potential to revolutionize transportation science and planning from the “bottom up”: a new science and practice built from individual activities in space and time as the core concept.

4

Activity Data Collection and Analysis

ABA includes a rich suite of tools for empirical measurement and analysis of mobility and communication behavior. The conceptual origins for this approach are based in time geography, but this approach has been revolutionized by the rise of LATs and the availability individual-level mobility and communication data. These data can be analyzed directly for empirical patterns. They can also be used as inputs to ABA models, as well as in model calibration and validation. Data-driven methods are also used in mobility mining: open-ended exploration of moving objects data to search for novel hypotheses. Space–Time Paths The basic conceptual entity in ABA is the fundamental time geographic entity, the space–time path, and its extension, the space–time prism. The

194 Fig. 2 A space–time path among activity locations

H. J. Miller Time

Grocery

Day care center

Work

Home Geospace

space–time path represents actual mobility (recorded or simulated) of an entity moving in geospace with respect to time. (A geospace is a low-dimensionality space – usually three dimensions or fewer – where distances between location pairs represent shortest path relations in some real-world geography). Figure 2 illustrates a space–time path between four activity locations in geographic space (the latter conceptualized as tubes with locations in space and extents in time reflecting their availability). Semantically, the path is a continuous mapping from time to geospace. In practice, data are typically a sequence of sampled locations strictly ordered by time. Traditionally, these data were collected using recall methods such as travel–activity diaries, prospective methods such as experiments where study participants solve contrived activity and travel scheduling problems. These traditional data collection methods are fraught with problems, including nonparticipation biases, recall biases, and accidental or willful inaccuracies (in the case of travel diaries) as well as difficulties in creating meaningful scenarios (in the case of prospective methods). LATs such as assisted GPS technologies in smartphones allow more accurate and higher volume data collection to support space–time path reconstruction. However, this often comes at the expense of path semantics such as the context for the mobility episode including the planned and executed activities. Semantics can be recovered by overlaying paths with high-resolution georeferenced land-use and infrastructure data. This method can produce errors related to data inaccuracies and activity ambiguities (e.g., what is a person doing while in a coffee house – dining, working, socializing, or some combination of the above?).

Activity-Based Analysis

195

The sequence of sample locations can be generated in several ways depending on the data collection method (Andrienko et al. 2008; Ratti et al. 2006): Event-Based Recording: Time and location are recorded when a specified event occurs; this is typical of traditional diary methods but also characterizes data from cell phones, for example, a person calling from a mobile phone generating a location sample. Time-Based Recording: Mobile object positions are recorded at regular time intervals; this is typical of GPS and related technologies. Change-Based Recording: A record is made when the position of the object is sufficiently different from the previous location; this includes dead-reckoning methods as well as mobile objects database technologies that avoid recording some locations to manage data volume. Location-Based Recording: Records are made when the object comes close to specific locations where sensors are located; examples include radiofrequency identification and Bluetooth sensors. The path must be reconstructed from the temporally ordered sequence of sample locations. The standard method is linear interpolation between temporally adjacent sample points. This requires the least amount of additional assumptions but admits physically unrealistic motions such as infinite acceleration and deceleration at sharp corners. Interpolation via Bezier curves generates a smoother, more physically realistic space–time path (Macedo et al. 2008; Miller 2005a). Three types of error occur in space–time paths. Measurement error refers to error in the recorded location or timestamps. This is equivalent to the well-studied problem of measurement error in polylines in geographic information science. Sampling error refers to capturing a continuously moving object using discrete sampling. One way to deal with this is to treat the unobserved segments between sampled locations as an uncertainty region delimiting possible locations for the object between observations (Macedo et al. 2008). This is equivalent to another fundamental time geographic concept, the space–time prism, to be discussed below. Combined measurement and sampling error comprises the third type of space–time path error; this is equivalent to measurement error in a space–time prism since under these conditions the space–time prism is a sequence of linked, imperfectly measured space–time prisms. Space–time paths contain many properties that are useful for understanding human mobility behavior. Analytical methods for paths include (Andrienko et al. 2008; Long and Nelson 2013): Path Descriptors include both moment-based descriptors (such as the time, location, direction, and speed at any moment) and interval-based descriptors (such as the minimum, maximum, and mean speed; the distribution and sequence of speeds and directions; and the geometric shape of the path over some time interval). Path Comparison Methods allow quantitative comparisons among space–time paths, particularly with respect to geometric similarity in space–time and with respect to

196

H. J. Miller

semantics (such as the sequence of locations visited). Methods include path distance measures such as the Fréchet distance and sequence measures such as least common subsequences. Pattern and Cluster Methods for identifying synoptic spatial–temporal patterns from large collections of mobile objects. Individual-Group Dynamic Methods for characterizing collective movement behavior such as flocking, for example, methods that examine the relative motions among mobile objects. Spatial Field Methods for translating movement patterns of objects into fields or surfaces that summarize mobility and activity frequency by geographic location. Spatial Range Methods for identifying and characterizing the geographic area that contains the observed mobility of one or more mobile objects. Long and Nelson (2013) provide a succinct but comprehensive review of these methods. Space–Time Prisms The space–time prism represents potential mobility: it delimits possible locations for a space–time path during some unobserved time interval. Figure 3 illustrates a planar space–time prism. A prism can have two interpretations. As noted above, the prism can be an uncertainty region for an under-sampled space–time path. In contrast, Hägerstrand (1970) conceptualized the prism as a measure of space–time accessibility. The prism Time

Time budget

f(max travel velocity)

Minimum stationary activity time

Potential path area First anchor

Second anchor

Geospace Fig. 3 A planar space–time prism

Activity-Based Analysis

197

encompasses all locations that can be reached during the unobserved time interval given constraints on the object’s speed. Activities, conceptualized as tubes at specific locations with limited extent in time (see Fig. 2), must intersect the prism to a sufficient degree (at least as long as the minimal activity time) for the activity to be feasible for that person at that time and location. The prism is difficult to state analytically over the entire interval of its existence. However, it is tractable to define the prism’s spatial extent at a moment in time as the intersection of only two of three simple spatial regions (Miller 2005a). It is also possible to define space–time prisms within transportation networks (Kuijpers and Othman 2009). Figure 4 illustrates a network time prism: the figure illustrates the accessibility locations within the planar network and the corresponding spatiotemporal region comprising the complete network time prism. In addition to being the envelope for possible space–time paths, these paths also give the prism an internal structure, including unequal visit probabilities within the interior (Winter and Yin 2011). Prisms contain error propagated from the measured space–time anchors and object speed limits. Error distributions can be numerically generated through Monte Carlo simulation: generate many realizations of the prism and analyze the resulting data. This is a tractable approach for theoretical investigation but is not scalable to practical applications. Alternatively, it is possible to derive analytical characterizations of prisms and prism–prism intersection error in planar space using spatial error propagation theory and implicit function techniques applied to the intersection of circles and ellipses. However, some intersection cases are still open, and it is not scalable beyond pairs of prisms. Required is further investigation into tractable error approximations based on spatial error propagation methods (Kobayashi et al. 2011). More tractable are uncertain network time prisms based on spatiotemporal probability regions (not necessarily connected) for anchor locations and times within the network (Kuijpers et al. 2010). Prisms can be used as inputs to activity models, in particular choice set or feasible activity set delimitation. Prism-based measures provide vividly different portrayals Fig. 4 A network time prism

198

H. J. Miller

of accessibility across social, gender, and cultural dimensions relative to traditional place-based measures that tend to mask these differences (Kwan 1998). Prisms can capture activity time constraints within accessibility measures that are consistent with spatial choice, spatial interaction, and consumer surplus theory (Miller 1999). Path–prism and prism–prism intersections represent potential interaction between two mobile objects. Both can be solved in planar space for a moment in time (Miller 2005a). Scalable techniques also exist for network prism intersections (Kuijpers and Othman 2009). Prism–prism intersections are also useful for capturing the possibility of joint activity behavior in activity-based measures and models (Neutens et al. 2007). The space–time prism focuses on physical accessibility in geographic space or transportation networks. Path and prism concepts have been extended to encompass interactions within cyberspace (the virtual space implied by networked ICTs). Interaction and accessibility in cyberspace can be treated as direct relationships among space–time paths and prisms (Yu and Shaw 2008) or as indirect relationships mitigated by access to communication technologies (Miller 2005b). It is also possible to treat the STP as existing in a hybrid geo/information space (Couclelis 2009). Mobility Mining Increasing capabilities for collecting and processing mobile objects data is leading to the emergence of mobility mining as a new area of research. Mobility mining leverages mobile objects databases with advances in data mining techniques to create a knowledge discovery process centered on the analysis of mobility with explicit reference to geographic context. Mobility mining involves three major phases (Giannotti and Pedreschi 2008): Trajectory Reconstruction from raw mobile objects data. The basic problem was discussed above; the specific problem in this context is to reconstruct trajectories from massive mobile objects data, especially when the data are collected using different methods and sampling methods/rates. This may involve preprocessing steps such as data selection, cleaning, and integrating with other geographic and sociodemographic data. Pattern Extraction involves using spatiotemporal data mining methods to discover interesting (novel, valid, understandable, and useful) patterns in the reconstructed trajectories. Types of patterns include clusters, frequencies, classifications, summary rules, and predictive models. Knowledge Delivery involves verifying and interpreting the discovered patterns, integrating these patterns with background knowledge and communicating this information to support scientific and applied decision-making. Mobility mining and knowledge discovery from mobile objects databases are hypothesis-generation processes that should lead to more focused and conclusive investigation. These techniques and processes play roles in the scientific process similar to instrumentation such as a telescope, microscope, or supercollider: it allows analysts to see empirical phenomena that would otherwise be obscured or difficult to detect. Empirical patterns discovered during the data mining process are tentative

Activity-Based Analysis

199

until they have been verified using confirmatory statistics and interpreted in light of background knowledge and theory.

5

Activity-Based Modeling

Although theoretically and evidentially suspect, the trip-based approach offers a significant strength, namely, it is relatively straightforward to build scalable comprehensive models of transportation and urban systems that are easily calibrated, verified, summarized, and mapped. It is more challenging to build, verify, and digest comprehensive models built from the micro-level. LAT-based data, geometric growth in computing power, and the hard work of some very smart people are making activity-based models more realistic, powerful, and understandable. Consequently, ABA is being increasingly applied in policy and planning analysis in Europe, the United States, and other locations. Depending on the system being modeled, activity-based models can encompass a large number of decision variables over a wide range of temporal and spatial granularities and time frames. In addition, activity-based models are often components in broader comprehensive urban models and linked human–physical process models. Possible components of activity-based models include (Ben-Akiva and Bowman 1998): Activity Implementation involving the execution and possible rescheduling of activity, travel, and communication plans based on empirical conditions in real time. This includes decisions such as mode and route choice, but also fine-grained context-specific behaviors such as speed, acceleration, merging and car-following behavior in automobiles, bicycling behaviors such as obeying stop signs, and pedestrian behavior within crowded environments. Activity Scheduling includes activity selection, activity assignment within household and other social networks, activity scheduling, selection of activity locations, and methods and times for mobility. These events occur frequently and regularly at time scales ranging from real time to hourly, daily, weekly, monthly, seasonally, and annually. Sociodemographic Systems include work, residence, ownership, and other life-altering personal, social, and economic decisions and events such as having children or buying a bicycle. These occur infrequently at the scales from annual to decadal. Urban, Social, and Economic Systems include the infrastructure, services, institutions, and social and built environments that influence implementation, activity, and lifestyle decisions. These systems operate from real time (e.g., traffic conditions) through annual (e.g., housing dynamics) to decadal and beyond (e.g., compact versus sprawled cities). Physical Systems include material, energy, hydrologic, biological, atmospheric, and other environmental systems that affect and are affected by the other activity domains. These operate in real time (e.g., air quality) to geologic (e.g., climate change).

200

H. J. Miller

Activity-based models slice, dice, and combine these components in different ways depending on the modeling domain and scope, as well as the strengths and weaknesses of the particular technique. Major types of activity-based modeling techniques are (i) econometric models, (ii) optimization methods, (iii) computational process models, and (iv) microsimulation and agent-based models. Some of these approaches can also be used in combination, for example, econometric models as a component of a larger microsimulation model or a computational process model used to derive agent behavior in an agent-based model. Econometric Models Econometric models are among the oldest activity-based modeling strategy, resulting from extending trip-based econometric models to encompass activity choice and trip-chaining behavior. These models have their foundation in the microeconomic theory of consumer choice. They require specifying relationships between individual attributes, environment factors, and activity–travel decisions in the form of a utility function whose parameters are estimated from empirical data, assuming utility-maximizing choices. Econometric models of activity–travel behavior are often in the form of discrete choice models such as multinomial and nested logit models. Figure 5 provides an example of a nested logit representation of activity–travel behavior (after Ben-Akiva and Bowman

Daily activity pattern

P1

P2

Pk

...

Primary tour: Timing, destination and modes

T1

T2

Tl

...

Secondary tour: Timing, destination and modes

T⬘1

T⬘2

...

Fig. 5 Nested logit representation of activity–travel behavior

T⬘m

Activity-Based Analysis

201

1998). Other nesting structures are possible depending on what activity facets are being analyzed. More elaborate econometric structures are also used, such as structural equations, hazard-based duration models, and ordered response models (Ben-Akiva and Bowman 1998; Pinjari and Bhat 2011). Advantages of econometric models are a rigorous theoretical foundation and mature methodologies for model specification, calibration, and validation. Weaknesses include the empirically suspect assumption that individuals are perfectly rational utility maximizers and the lack of an explicit process theory to describe the activity–travel decision-making (Timmermans et al. 2002). Optimization Methods Finding an ideal activity pattern based on criteria such as time, cost, and utility is similar to the problem of finding optimal tours through a transportation network with scheduled pickups and deliveries (Recker 1995). There is a large literature in operations research and management science on problems such as assignment, scheduling, and routing subject to time windows. These are complex combinatorial problems, but computational search methods have become very sophisticated and powerful. This is a normative approach: the idea is not to replicate real-world behavior but rather generate ideal patterns that can be used as benchmarks for evaluating realworld behavior with respect to efficiency. These comparisons can help identify empirical factors and heuristics that cause people to deviate from ideal patterns. Rule-Based Models Computational Process Models (CPMs) are a system of action–condition pairs (semantically expressed as “if–then” rules) that describe the activity–travel decision process in some empirical domain. Decision rules are often organized according to different subcomponents of the activity system. However, most CPMs focus on activity scheduling and implementation (e.g., Recker et al. 1986). Rules can be derived informally from intuition and knowledge based on previous research. Rules can also be inferred from empirical data using data mining techniques such as decision tree induction and association rules (Arentze et al. 2000). CPMs are highly flexible, allowing a wide range of heuristics that better represent decision-making in the real world. However, a weakness is the difficulty in enumerating the large number of rules required for even for a modest activity scheduling and implementation problem. CPMs also do not have a mature theory and techniques for testing variables and distinguishing between good and bad models (Buliung and Kanaroglou 2007; McNally and Rindt 2007). Microsimulation and Agent-Based Models Microsimulation and agent-based models are computer-based methods for predicting the evolution of a complex system. Microsimulation refers to the computer-based modeling phenomena at the disaggregate level to better understand complex dynamics at the aggregate level. Microsimulation has a long tradition in social science, dating back to attempts to modeling the US economy in the 1950s with household and firm behavior as the

202

H. J. Miller

fundamental units. Microsimulation models tend to fall into two categories. Static models typically rely on cross-sectional data and result in no change to the structure of the cross section (e.g., internal composition, sample size) as the model executes over time. Dynamic models rely on cross-sectional or longitudinal data and produce changes to the total number of micro-units. Dynamic models are used to forecast and track modifications of entities over longer time periods than static models (Buliung and Kanaroglou 2007). Agent-Based Modeling (ABM) is closely related to microsimulation but has a stronger conceptual foundation. ABM views systems as collections of autonomous, adaptive, and interacting agents. An agent is an independent unit that tries to fulfill a set of goals in a dynamic environment. An agent is autonomous if its actions are independent (i.e., makes decisions without an external controlling mechanism) and adaptive if its behavior can improve over time through a learning process. Agents interact by exchanging physical resources and information and/or by reacting to presence or proximity. ABM describes a system from the perspective of its constituent units’ activities; this is appropriate when individual behavior cannot be described adequately through aggregate rules and activities are a more natural way of describing the system than processes (Bonabeau 2002). The distinction between microsimulation, ABM, and rule-based techniques discussed previously can be vague, particularly in practice. Rule-based methods can be used to drive agent behaviors and microsimulations, and agents can be a central component of broader microsimulation models (e.g., Arentze et al. 2000). It is also possible to link these models with dynamic microscale traffic models to simulate the interrelationships among transportation demand, transportation system performance, and activity scheduling/implementation (see Bekhor et al. 2011). Advantages of microsimulation and ABM include the explicit representation of micro-level behaviors and processes, the ability to develop and test behavioral theory, better understanding of macro-level processes produced by individual-level behaviors, maintaining the heterogeneity of information (such as individual identity) during simulation, minimization of model bias, better policy sensitivity, integration of processes operating at different temporal scales, and improved model transferability (Buliung and Kanaroglou 2007). Disadvantages include a lack of mature methodologies for calibration and validation, although these models lend themselves to expert engagement and judgment better than traditional, analytical models (Bonabeau 2002). It can also be difficult to make sense of microsimulation models and ABMs: these methods essentially generate a large dataset that must be explored and analyzed. This can be challenging since good scientific practice requires a careful experimental design for parameters that are not empirically derived. The design should vary parameters systematically while holding others fixed to assess the simulation outcomes, often with multiple simulation runs for each parameter combination to eliminate artifacts from random number generators. This can generate a huge amount of simulated results, particularly if there is a large number of parameters and parameter levels to explore.

Activity-Based Analysis

6

203

Frontiers in Activity-Based Analysis

Much progress has been accomplished in ABA; this progress is likely to continue as favorable policy, computational, and data environments help scientists and practitioners propel it forward intellectually. This section briefly discusses major research frontiers in ABA. Social Networks Social networks are at the heart of time geography and ABA: space–time paths bundle to conduct shared activities, prisms intersect to allow this possibility, households are a fundamental unit for activity organization and sharing, and activity coordination and adjustments cascade through broader activity and social systems. Time geography and ABA are an ecological approach to transportation, cities, and societies with a complex web of interconnections (Pred 1977; Ellegård and Svedin 2012). Capturing the social network influences on activity, mobility, and communication behavior is a very active frontier in ABA (Neutens et al. 2008). A major challenge in capturing social networks in ABA concerns basic definition, measurement, and data collection. Social networks can range from a few intimate individuals to hundreds of Facebook friends. The problem is that all of these networks are relevant to activity behavior depending on the context. Measuring social networks is also difficult, particularly more genuine and enduring networks. Social influence within these networks can also vary depending on formal and informal relations. Finally, social networks have complex topologies such as SmallWorld configurations that can generate complex dynamics. LATs and social media can inform social networks in ABA. As mentioned above, path–path, path–prism, and prism–prism relationships indicate the possibility of social interaction, and methods for collective mobile objects data analysis are improving. Problems include dealing with coincidental proximity (e.g., friends versus strangers in a coffeehouse) and activity ambiguity (e.g., a coffeehouse again). Location data error is also a challenge: this can be substantial for some LATs in some environments (e.g., GPS receivers in city centers, cellular network location in rural areas). Social media are convincing millions of people to share details of their lives online. The implications of these data for understanding and predicting activity, travel, and communication behavior should be obvious, including that people use these media to plan and coordinate activities. Challenges include nonrepresentation biases and unstructured data. Social media participants are not scientifically sampled, nor do people share everything about their lives (with some notable exceptions). Nevertheless, the massive size of these databases makes them valuable. Social media data are also unstructured: nonquantitative data such as text and imagery. Intriguingly, these data are increasing georeferenced due to social media applications in smartphones. One way to treat these data is from a mobility mining perspective: use social media data to generate hypotheses that can be tested with more focused, confirmatory techniques and scientifically sampled or experimental data.

204

H. J. Miller

Unfortunately, access to LAT and social data can be circumscribed due to proprietary and competitive reasons. This has the danger of leading to a computational approach that will revolutionize the social sciences but only as practiced in private sector companies and secret government agencies (Lazer et al. 2009). Big Data and Knowledge Delivery Big Data refers to data that has high volume (massive databases with millions of records), high variety (structured and unstructured data), and high velocity (often in real time). The Big Data mantra is to keep all of these data since they may be useful; the astonishing collapse in data storage costs over the past two decades makes this possible. In many locations in the world, we are moving toward sensed transportation systems with sensors embedded in infrastructure and vehicles, as well as high-resolution but remotely positioned sensors such as LiDAR. These data combined with consumer LAT data and social media will generate orders of magnitude more data about transportation and cities than currently exist. A previous section of this chapter discussed the role of mobility mining in ABA. Research frontiers include not only dealing with massive transportation, mobility, and communication data but delivering actionable knowledge to decision-makers sufficiently fast, so they can act before the knowledge is irrelevant. This is a challenging frontier that involves elements of exploratory and confirmatory analysis as well as decision support. Big Data also has the potential to create more collaborative transportation and social systems. This is a major motivation behind IBM’s Smarter Planet initiatives. Collaborative transportation systems can range from ride/vehicle sharing to long-term strategic decision-making about transportation and urban futures. The challenge is to create not only the knowledge delivery techniques discussed in the previous paragraph but also the tools and environments for sharing, collaboration, and collective governance. Locational Privacy The benefits of an ABA reinvigorated through more data and computational power may not be realized if there is a public backlash due to abuses of these data. Locational privacy is the concept that the space–time signature that comprises activity patterns can reveal much about a person and her/his activities. This is a fundamental change: as the United States Supreme Court commented during a recent decision, LATs provide not isolated facets but a person’s entire life. Locational privacy protection strategies include regulation, privacy policies, anonymity, and obfuscation. Regulation and privacy policies define unacceptable uses of location data. Anonymity detaches locational data from an individual’s identity. Obfuscation techniques degrade locational data through deliberate undersampling, aggregation, introducing measurement error, or some combination of the above. Scientific challenges include new research ethical protocols for dealing with location data, especially user-generated content and remote but high-resolution sensors that can reveal things and activities that were previously considered private. Another scientific challenge is dealing with deliberately degraded locational data;

Activity-Based Analysis

205

spatial and spatiotemporal error methods for mobile objects data are still lacking to a large degree. More generally, societies need to have conversations about the acceptable and unacceptable uses of these data if their role in building better transportation systems and communities is to continue its remarkable progress.

7

Conclusion

Activity-based analysis (ABA) is emerging as the dominant approach in transportation science and planning (Timmermans et al. 2002). It is a theoretically sound approach to transportation, cities, societies, and human–physical systems that focuses on a person’s activities in time and space as the foundation. Changes in policy are encouraging a wider view of transportation, and the increasing availability of individual mobility data and scientific advances inspired by this favorable environment are making ABA methods scalable to realistic scenarios and problems. Data-driven methods allow high-resolution measurement of fundamental ABA entities such as the space–time path (representing actual mobility) and the space–time prism (representing potential mobility, interpreted as path sampling error or space–time accessibility). There is a wide range of methods for measuring, comparing, and summarizing collections of space–time paths, but fewer methods for the space–time prism. These data can be used for empirical investigation, mobility data mining, and as inputs to ABA modeling. ABA models attempt to solve or simulate activity behavior. Most models focus on the activity scheduling and implementation problems. These ABA core models can be linked with transportation system performance models to capture the dynamics of mobility demand and system response. These core models can also be embedded in broader models of cities, sociodemographics, and physical systems such as airsheds. Major modeling approaches include econometric models, optimization methods, computational process models, and microsimulation/agent-based models. There are several ABA research frontiers; these include social networks, delivering knowledge in the face of Big Data, and location privacy. Progress along these frontiers will support the continuing rise of ABA in understanding and planning transportation and related systems. Acknowledgments Dr. Walied Othman (University of Zurich) provided the Mathematica code to generate the network time prism (Fig. 4); this is available at http://othmanw.submanifold.be/. Ying Song (Ohio State University) generated some of the graphics. Ying Song and Calvin Tribby (Ohio State University) provided valuable comments on this chapter.

References Andrienko N, Andienko G, Pelekis N, Spaccapietra S (2008) Basic concepts of movement data. In: Giannotti F, Pedreschi D (eds) Mobility, data mining and privacy. Springer, Heidelberg, pp 15–38 Arentze T, Hofman F, van Mourik H, Timmermans H (2000) ALBATROSS: multiagent, rule-based model of activity pattern decisions. Transp Res Rec 1706:136–144

206

H. J. Miller

Bekhor S, Dobler C, Axhausen K (2011) Integration of activity-based and agent-based models: case of Tel Aviv, Israel. Transp Res Rec 2255:38–47 Ben-Akiva M, Bowman JL (1998) Activity based travel demand model systems. In: Marcotte P, Nguyen S (eds) Equilibrium and advanced transportation models. Kluwer, Boston, pp 27–46 Bonabeau E (2002) Agent-based modeling: methods and techniques for simulating human systems. Proc Natl Acad Sci 99(suppl 3):7280–7287 Buliung RN, Kanaroglou PS (2007) Activity–travel behaviour research: conceptual issues, state of the art, and emerging perspectives on behavioural analysis and simulation modeling. Transp Rev 27(2):151–187 Chapin FS (1974) Human activity patterns in the city: things people do in time and space. Wiley, London Couclelis H (2009) Rethinking time geography in the information age. Environ Plan A 41(7):1556–1575 Ellegård K, Svedin U (2012) Torsten Hägerstrand’s time-geography as the cradle of the activity approach in transport geography. J Transp Geogr 23:17–25 Giannotti F, Pedreschi D (2008) Mobility, data mining and privacy: a vision of convergence. In: Giannotti F, Pedreschi D (eds) Mobility, data mining and privacy. Springer, Heidelberg, pp 1–11 Hägerstrand T (1970) What about people in regional science? Pap Reg Sci Assoc 24(1):6–21 Jones PM (1979) New approaches to understanding travel behaviour: the human activity approach. In: Hensher DA, Stopher PR (eds) Behavioral travel modeling. Croom-Helm, London, pp 55–80 Kobayashi T, Miller HJ, Othman W (2011) Analytical methods for error propagation in planar space-time prisms. J Geogr Syst 13(4):327–354 Kuijpers B, Othman W (2009) Modeling uncertainty of moving objects on road networks via spacetime prisms. Int J Geogr Inform Sci 23(9):1095–1117 Kuijpers B, Miller HJ, Neutens T, Othman W (2010) Anchor uncertainty and space-time prisms on road networks. Int J Geogr Inform Sci 24(10):1223–1248 Kwan M-P (1998) Space–time and integral measures of individual accessibility: a comparative analysis using a point-based framework. Geogr Anal 30(3):191–216 Lazer D, Pentland A, Adamic L, Aral S, Barabási A-L, Brewer D, Christakis N, Contractor N, Fowler J, Gutmann M, Jebara T, King G, Macy M, Roy D, Van Alstyne M (2009) Life in the network: the coming age of computational social science. Science 323(5915):721–723 Long JA, Nelson TA (2013) A review of quantitative methods for movement data. Int J Geogr Inform Sci 27(2):292–318 Macedo J, Vangenot C, Othman W, Pelekis N, Frentzos E, Kuijpers B, Ntoutsi I, Spaccapietra S, Theodoridis Y (2008) Trajectory data models. In: Giannotti F, Pedreschi D (eds) Mobility, data mining and privacy. Springer, Heidelberg, pp 123–150 McNally MG, Rindt CR (2007) The activity-based approach. Working paper UCI-ITS-AS-WP-071, Institute of Transportation Studies, University of California-Irvine Miller HJ (1999) Measuring space-time accessibility benefits within transportation networks: basic theory and computational methods. Geogr Anal 31(2):187–212 Miller HJ (2005a) A measurement theory for time geography. Geogr Anal 37(1):17–45 Miller HJ (2005b) Necessary space-time conditions for human interaction. Environ Plan B Plan Design 32:381–401 Neutens T, Witlox F, van de Weghe N, DeMaeyer P (2007) Space-time opportunities for multiple agents: a constraint-based approach. Int J Geogr Inf Sci 21(10):1061–1076 Neutens T, Schwanen T, Witlox F, De Maeyer P (2008) My space or your space? Towards a measure of joint accessibility. Comput Environ Urban Syst 32(5):331–342 Pinjari AR, Bhat CR (2011) Activity-based travel demand analysis. In: de Palma A, Lindsey R, Quinet E, Vickerman R (eds) Handbook in transport economics. Edward Elgar, Cheltenham, pp 213–248 Pred A (1977) The choreography of existence: comments on Hägerstrand’s time-geography and its usefulness. Econ Geogr 53(2):207–221

Activity-Based Analysis

207

Ratti C, Pulselli RM, Williams S, Frenchman D (2006) Mobile landscapes: using location data from cell phones for urban analysis. Environ Plan B 33(5):727–748 Recker WW (1995) The household activity pattern problem: general formulation and solution. Transp Res B 29(1):61–77 Recker WW, McNally MG, Root GS (1986) A model of complex travel behavior: part i. theoretical development. Transp Res Part A 20(4):307–318 Timmermans HJP, Arentze T, Joh C-H (2002) Analyzing space-time behavior: new approaches to old problems. Prog Hum Geogr 26(2):175–190 Winter S, Yin Z-C (2011) The elements of probabilistic time geography. GeoInformatica 15 (3):417–434 Yu H, Shaw S-L (2008) Exploring potential human activities in physical and virtual spaces: a spatio-temporal GIS approach. Int J Geogr Inform Sci 22(4):409–430

Social Network Analysis Richard Medina and Nigel Waters

Truth is such a difficult problem that most people don’t see any problem in it at all. Friedrich Dürrenmatt I know that for a fact. . .. Anonymous

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Social Media as Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 The Leading Social Media/Social Networking Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Analysis of Facebook’s Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Analysis of Google’s Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Google–Facebook Competition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Social Networking Sites–Print Media Competition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Location and Social Network Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Regional Contextual Implications for the Formation of Online Social Networks . . . 5.2 Hyperlocal Marketing in Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Antitrust Action and Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Quantitative Social Network Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Measures of Structure and Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Procedures for Content Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 How Value Is Generated with a Social Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Microtargeting and Fake News . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Analyzing Fake News on Social Networks: Regional Science Implications . . . . . . . . . . . . . 10 Social Networks and Social Media for Illegal Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

210 211 213 213 214 214 215 216 216 216 217 218 219 220 221 223 224 225 226 227 227

R. Medina Department of Geography, University of Utah, Salt Lake City, UT, USA e-mail: [email protected] N. Waters (*) Department of Geography, University of Calgary, Calgary, AB, Canada e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_49

209

210

R. Medina and N. Waters

Abstract

This chapter elucidates how social network analytics influence regional science. The current state of the art in social network analysis, both quantitative and qualitative, is examined. How social network analysis can be and is used to influence and to understand the spatial manifestations of economic and political activity across regions in continuous and network space, is explained. The major social networks now extant are described, and their current activities are discussed including the topics of microtargeting, transparency, privacy, proprietary algorithms, and “fake news” and its synonyms such as post-truth. Policy implications to influence and govern these activities, such as taxation and regulation, are reviewed and their regional economic implications discussed. Keywords

Social networks · Fake news · Network science · Geodemographic · Microtargeting · Data mining Abbreviations

API AT&T BBC EU FN GDPR IC IM LT NYT RS SM SM/SN SN SNA SQL UGC VR

1

Application programming interfaces American Telephone and Telegraph British Broadcasting Corporation European Union Fake News General Data Protection Regulation Independent cascade Influence maximization Linear threshold New York Times Regional science Social media Social Media/Social Networks Social network Social network analysis Structured query language User-generated content Virtual Reality

Introduction

In an earlier chapter on Social Network Analysis (Waters 2014), in the first edition of this text, there was considerable discussion of the foundational history of network science from the 1950s through to the 1990s and into the 2000s. The origins of social

Social Network Analysis

211

networks and their rise in importance as widely adopted and influential computerbased platforms were explained. In addition, the influence of social networks and media on economic activity and regional science and the so-called death of distance were reviewed. This chapter will extend these narratives and discussions and merge them with developments that have occurred over the last decade. Here we define a social network as a social structure of people or organizations that are linked together by ties that are now mediated primarily within a computer environment. Methodologies that allow for the analysis of these networks are known as network science, network mining, and social media mining (Zafarani et al. 2014) and are used and exploited in many disciplines. They are of interest in the disciplines of Geography and Regional Science since they weaken or negate the impact of the influence and friction of distance, a fundamental concept in both disciplines. The diminishing of the influence of distance aids globalization and a global perspective. Studying the implications of social media and social networks to determine how they impact economic and social activity using network analysis and related techniques is both a pure and an applied science and involves both quantitative and qualitative analysis, and the latter does much to inure the discipline of regional science from the criticisms levelled at it almost two decades ago by Barnes (2003).

2

Social Media as Social Networks

Today many of the preeminent social networks (SN) are mediated on computer platforms (Boyd and Ellison 2007), and this is certainly true from the perspective of their economic and political impact. These social networks serve many different purposes today and have many different foci including friendship/companionship, education, recreational activities, news and information, hobbies, professional organization, and multimedia. This chapter will focus primarily on those instances of social media (SM) that function as SN and will concentrate on the analysis of their impact and influence. These will be referred to as social media/social networks (SM/SN). The most effective way to stay up-to-date on SM/SN and gain an understanding of their current status is to consult the pages of Wikipedia which, ironically, some academics consider to be more of an SN than an encyclopedia because of the way it gathers and edits its content. Wikipedia notes that their category of SM has 158 separate pages/topics, but it also has 13 subcategories, one of which is SN. This subcategory of SN itself has 146 separate topics including social network analysis (SNA). In addition, to Geography and Regional Science, SNA is of interest to an array of social science disciplines including anthropology, communication studies, criminology, demography, development studies, political science, and sociology, among more than a dozen other social science disciplines to varying degrees, as well as some natural sciences such as biology and its subdisciplines and also information/ data science and computer science. Indeed, it is the latter discipline that has generated two editions of the Encyclopedia of Social Network Analysis and Mining (Alhajj and Rokne 2018). A vast array of topics is explored in these encyclopedias including,

212

R. Medina and N. Waters

for example, scholarly networks of academics and researchers. The subject of exploring the scholarly network of regional scientists and their citations since Walter Isard organized the first regional science meeting in December 1954 (Barnes 2003) would be rewarding and amenable to social network analysis but must, for the present, be left to others. SM, arguably today’s most dominant form of SN, are Web 2.0 applications that are characterized by user-generated content and profiles which are mediated and maintained by the organization that owns the site. The organization creates and develops the infrastructure that allows users to connect into groups and communities and usually gains a proprietary interest in all materials posted to the site. As Kitchin (2017, 180) has argued, “if the product is free, you are the labour and the product.” What exactly an SN company is has been debated in the traditional media and in the courts of law. This has become an overwhelming concern in recent years because the definition of whether an SN company, such as Facebook (initially characterized as the social network), is the same as traditional media such as newspapers, radio stations, and television networks will determine how SN are regulated and taxed and how profitable they are. If they are defined as being different from all previous media, then it must be assumed that new laws and regulations must be enacted. If these social networks are deemed to be essentially the same as traditional media, then they should be regulated and taxed in the same way. In a recent court case, Facebook’s legal representatives argued that they were indeed a publisher that had to make editorial decisions. Commentary in The Guardian newspaper stated that this opinion was contradictory to Facebook’s previous claims (Levin 2018): Facebook has long had the same public response when questioned about its disruption of the news industry: it is a tech platform, not a publisher or a media company. But [in a court case on July 2nd, 2018 Facebook attorneys] presented a different message from the one executives have made to [the US] Congress, in interviews and in speeches: Facebook, they [now] repeatedly argued, is a publisher, and a company that makes editorial decisions, which are protected by the first amendment. [links to other stories have been removed]

Since Facebook was arguing for protection under the US First Amendment, it is likely that they will be impacted differently by the regulatory regimes of other countries and it might be envisaged that they would not be adverse to having different companies that operated within differing legal frameworks in other countries, a situation that might be imposed by some forms of antitrust legislation (see discussion below). But perhaps not. On May 25, 2018, the European Union’s (EU) General Data Protection Regulation (GDPR) was passed into law making effective some of the most stringent privacy laws anywhere. This new law applies to all companies that process the personal information of individuals within the EU, and this would certainly include SN. Under this legislation companies in violation of the GDPR could be fined up to 20 million euros or 4% of their previous year’s revenue. Companies operating under the GDPR would have to provide users the means to see their personal data held by the company; enable users to correct, delete, or remove any of these data; and notify authorities of any data breaches inside of

Social Network Analysis

213

72 h. In addition, the GDPR requires a right to an explanation of “algorithmic decisions.” Since these algorithms are almost always proprietary and usually protected by patents, these explanations would presumably be generalized, and, for example, coding details would not be released. For many, Facebook’s NewsFeed service is particularly insidious since it is micro-focused and is curated to what the Facebook’s users have liked in the past and also to what their friends like and to other influences such as their profile characteristics. This means that NewsFeed becomes an increasingly small, ever-diminishing echo chamber. Traditional media, such as newspapers, usually hew to a political viewpoint. Ideally this might be seen as reflecting “the norms of society,” but this has not necessarily worked well in the past. It is more reasonable to suggest that traditional news media reflect the views of their subscribers and their market whether local, regional, or national. Otherwise their papers will not be bought, radio stations will not be listened to, and TV networks will have too few viewers. It might also be argued that some “quality” papers will have opinion page writers who are iconoclastic and willing to present views that differ from the newspaper’s orthodoxy. This might be true of “quality” radio stations too. A reader, listener, or viewer can always buy a different paper or change the channel. The process is transparent and thus once again the medium is the message: “For the ‘message’ of any medium or technology is the change of scale or pace or pattern that it introduces into human affairs” (McLuhan 2006, 108), and SN have indeed changed the scale and pace and pattern of the distribution of news and information within countries and across the globe.

3

The Leading Social Media/Social Networking Sites

In rank order the leading social media/networking sites and their number of users in millions (2018) are Facebook (2,989), YouTube (1,900), WhatsApp (1,500), Facebook Messenger (1,300), WeChat (1,040), Instagram (1,000), QQ (806), Qzone (563), TikTok (500), Sina Weibo (411), Twitter (336), and Reddit (330). WhatsApp, Facebook Messenger, and Instagram are all now owned by Facebook, while Google owns YouTube.

3.1

Analysis of Facebook’s Growth

One way to understand the economic reach, power, and domination of the current SM/SN sites is to look at their acquisition and mergers history. Facebook was founded in 2004 and began acquiring companies the following year. Since 2004 it has acquired a total of 71 companies. Almost all were in the USA, and many were located in the San Francisco Bay Area. Most of these acquisitions were for the purpose of acquiring the talented people working at the acquired companies, and once acquired the companies were shut down. The acquisition of Instagram in 2012 for $1 billion, WhatsApp in 2014 for $19 billion, and Oculus VR (a virtual reality company) for $2 billion in 2014

214

R. Medina and N. Waters

were exceptions to this rule. They remain separate companies but, of course, are owned by Facebook. Thus, Facebook owns the first, third, fourth, and sixth most popular SM/SN sites in the world. Facebook is now available in 140 languages and since 2005 has been accessible worldwide except in those countries that explicitly block it including China, Iran, Bangladesh, and North Korea. In 2018 Facebook and Instagram were estimated to have almost $34 and $7 billion in advertising revenues, respectively. At the end of September 2018, Facebook was charging about 100 companies for selling targeted advertising to its users. While it is reasonable to suppose advertising revenue will grow with the growth of the world economy, it is also reasonable to assume that much of this growth in the revenue of SM/SN companies will be at the expense of the traditional media, that is, the TV, radio, and print media. Since many of the traditional media entities now have a presence on smartphone apps, there may be some hope that SN/SM will not come to dominate exclusively. Nevertheless, the percentage of Americans using some form of social networking site rose from 5% in 2005 to 63% in 2013 (an average per annum increase of 7.25%) but then rose to only 69% by 2018 (a per annum increase of just 1.2%, as a saturation point appeared to be close to having been reached). The growth in numbers of Facebook users may therefore resemble the traditional logistic curve for adopters although the initial slow adoption phase was unusually short and the slower growth of the present may be extended for a great many years.

3.2

Analysis of Google’s Growth

Google, or more accurately its holding company Alphabet (since 2015), is not per se an SM/SN company, but Alphabet owns YouTube and Google+, which are SM/SN companies, and, like Facebook, Google/Alphabet has a pernicious record of acquisitions. It has acquired more than 200 companies to date and averaged more than one acquisition a week throughout 2010 and 2011. The price of many of these acquisitions has not been disclosed, but of those that are known, many were bought for over $100 million. The most expensive was Motorola Mobility acquired for $12.5 billion. Google, founded in 1998, acquired its first company, Deja News and its archive, in 2001. Google now facilitated access to Usenet through Google Groups. One feature of Usenet was its hosting of moderated and unmoderated “news feeds” or items posted by members. The former was subject to censorship complaints and the latter to spam and “flame wars.” But in one sense it could be seen as a user-generated news source or online paper. It was how users, members of this SN, found out about what was happening in their world.

3.3

Google–Facebook Competition

Another significant acquisition for Google was Dodgeball, a location-aware app that allowed users to notify friends and friends’ of friends about their location and the location of events and businesses nearby. Dodgeball became Latitude once it was

Social Network Analysis

215

acquired by Google in 2005. Latitude was shut down in 2013, and its functionality moved to Google+, Google’s SN competitor to Facebook. In October 2018 Google announced that it was closing Google+ and that the consumer version would be completely shut down by the end of August 2019. Thus, despite having over 100 million users, Google noted low user engagement and the potential for a data breach exposing users’ data and, therefore, the possibility of attracting increased legislative oversight and regulation. For Google, this was the final failure in a long list of other similar products: Google Buzz, Google Friend Connect, and Orkut. In this battle between two SN titans, Facebook and its CEO, Mark Zuckerberg, had emerged victorious. The outcome of this struggle is a good example of the law of increasing returns and the influence of positive feedback in competition between tech companies. A “winner takes it all” outcome is almost always the final result. It is also the reason why tech companies and social media and networking companies should be subject to antitrust regulation to avoid monopolistic behavior.

4

Social Networking Sites–Print Media Competition

The New York Times (NYT) is one of the most successful papers in the world, and an analysis of its revenue streams for 2017 suggests what the future may hold for similarly highly successful, nationally dominant newspapers but not for smaller local/regional newspapers. By the end of 2017, the NYT had more than 2.6 million digital-only subscriptions, and that source of revenue increased over 46% in 2017. Subscription revenue accounted for 60% of the total revenue. Digital advertising rose 14%, and print advertising fell 14%, but print advertising had had the larger base, and so total advertising revenue still fell by 4%. The message for the future was that the total revenue would come from subscriptions (primarily digital) and digital advertising. Total revenue in 2017 had grown by 8% to 1.7 billion. The future looked OK – for the largest paper in the USA. For the US newspaper industry as a whole, the statistics for the period 2004–2017 have been less sanguine. Employment was at its highest in 2004 (74,410) and fell slightly to 71,070 in 2008, then precipitously to 55,260 by 2010, and then again more steadily to 39,210 in 2017. Weekday circulation started falling rapidly in 2004 from almost 55 million to less than 31 million in 2017. Digital circulation presents a more optimistic picture. The NYT and The Wall Street Journal digital circulation gained 42% and 26%, respectively, between 2016 and 2017. The rest of the US newspaper industry (minus The Washington Post) lost 9%. Advertising revenue reached a peak in 2005 of $49.435 billion but has since declined to an estimated $14.476 billion in 2017. Circulation revenue remained remarkably steady during this period and now stands at an estimated $11.211 billion. Some aspects of the industry (such as employment) have looked bleak and others such as circulation and digital subscriptions not so much. However, as Facebook and its associated SM/SN companies increase their overwhelming dominance of the social networking world, this is likely to change dramatically. Interestingly, readers can pay a subscription fee to the NYT and other news companies when articles are linked within Facebook, to continue reading NYT articles through the Facebook interface.

216

R. Medina and N. Waters

5

Location and Social Network Technology

5.1

Regional Contextual Implications for the Formation of Online Social Networks

In some social network cases, a user’s location matters, though the scale of that location can vary. Geographic context matters for some connections. Language learning applications, such as Tandem and HelloTalk, connect users from different regions of the world based on the languages they speak and the languages they are learning. While this is not a function of GPS location, the user’s stated location is important in the context of the specific social network relationships and helpful for language and cultural immersion. This is different from social networks, such as Facebook and Twitter, that do not depend on the location as much for the network connectivity. In the language applications, the language and culture are tied to place, but in other social networks, place can be a remnant of previous face-to-face contact, if physical contact existed at all. Trade and travel (e.g., Airbnb) networks are also examples of where regional context can play a role. Another relatively new innovation is the commercialization of DNA testing for ancestry information. Companies like Ancestry and 23andMe sell kits to buyers that submit their saliva, which is sent back to the company and processed for genetic results. The interesting social network part of this is that the test connects people to others that share their DNA and provides them a channel to communicate. In this way, the company is constructing biological social networks where people find relatives they may never knew they had. The results of the DNA test are all place based. For example, each test tells you how much of your DNA can be mapped to various regions as designated by the companies. The social network is also spatial in that users disclose where they live at present, so all members get a sense of where their ancestors lived and migrated to and where their living relatives reside now.

5.2

Hyperlocal Marketing in Social Networks

Hyperlocal is the term used for delivering goods, services, and information primarily to GPS-enabled smartphone users across their immediate neighborhoods. Google has leveraged these technologies especially in conjunction with its Maps app which provides the company with the user’s home location. Services, points of interest, and commercial operations such as stores, cinemas, and other potential destinations including educational institutions, libraries, and places of worship can all benefit by being promoted to prospective patrons and customers living in close proximity. Hyperlocal algorithms can focus marketing procedures across social networks that recognize socioeconomic characteristics and exploit friendship networks. Hyperlocal targeting can be used to inform residents or members of a network about upcoming events or, for example, recent criminal activity in a neighborhood. A social networking site can exploit time and space in a highly focused marketing strategy. Local news and election campaigning are other activities that have exploited hyperlocal marketing across social networks. In 2016 Facebook

Social Network Analysis

217

announced a new and upgraded version of its Marketplace service which had evolved out of Buy and Sell Groups within the social network. The Places feature in Facebook, launched in 2010, allowed users to let their Friends know where they were at a given time, and, in addition, the Deals feature in Places allowed users to check in from restaurants, coffee shops, bars, supermarkets, and other businesses to receive coupons and discounts and other promotions. In this instance, rather than acquiring the company, Yelp, which offered a similar crowdsourced restaurant review service, Facebook decided to compete directly. Now, with such a complete penetration of the economy, every business has to have a presence on Facebook, and every individual needs to do so as well. In David Eggers’ (Eggers 2013) terminology of a fictional world dominated by a social network, the “circle” is closing, and you have no option but to join. Hyperlocal “marketing” was also practiced without transparency when the firm, Cambridge Analytica, reportedly obtained the accounts of 87 million Facebook users in their alleged interference into the US 2016 presidential elections. Another innovation that utilizes GPS technologies is the advent of social networking applications that facilitate face-to-face interaction. Mainly used as dating tools or to promote sexual activity, such apps, including Tinder and Grindr, connect people within a specified distance based on demographics, physical appearance, and stated behaviors, though both parties must be in agreement on the will to meet in person. This innovation has potential to extend beyond dating or sexual networks into job and employment networks (e.g., LinkedIn), friendship networks, and other activity networks (e.g., Meetup). However, presently, dating seems to be the most lucrative.

6

Antitrust Action and Social Networks

One way to reduce the ever-increasing dominance of the SM/SN companies is to subject them to antitrust legislation. This could have significant impacts on the economic landscape. There are various anti-monopolist actions that could be taken against Google and Facebook, among others, to mitigate the crushing dominance and anticompetitive behavior of tech giants. The EU has already assessed Google a $5 billion fine for illegally tying Chrome and search apps to their platform on Android and a $4 billion fine for coercing users into sharing their personal data and providing inadequate opt-out possibilities. One solution would be to block companies such as Google from acquiring and assimilating other companies. Companies such as Waze, acquired by Google in 2013, could not simply “wait to be acquired”; rather they would have to remain separate and would have to compete directly with Google. To prevent such acquisition behavior, US Senator Amy Klobuchar (D-MN) introduced a bill in 2017 to amend the US Clayton Act so that no company with a market capitalization of more than $100 billion could acquire another company. That such action is needed has been emphasized by Google’s November 2018 decision to merge the artificial intelligence company, DeepMind (acquired by Google in 2014), into the main company, allegedly breaking a promise to the UK National Health Service that the accounts of UK patients would never be linked to other Google accounts and services. The

218

R. Medina and N. Waters

independent review board that had been created to oversee DeepMind’s work with the health sector was dismantled at the same time. In 2018 US Senator Mark Warner (D-VA) drafted a white paper that contains 20 recommendations for regulating “Big Tech Platforms.” These recommendations fall into three broad groupings: (1) combating disinformation, (2) protecting user privacy, and (3) promoting competition. More specific proposals included providing resources for media literacy programs and for helping military and intelligence communities to combat disinformation campaigns, the so-called fake news; identifying bots versus authentic accounts; legal liability for disclosure of private data and false information; characterization of certain products as “essential services” requiring access on “fair, reasonable, and nondiscriminatory” terms and a ban on “preferential conduct;” increased powers for consumers to limit the use of their data, possibly similar to rules in the EU’s GDPR regulations; and a suggestion that platforms place a monetary value on users’ data, thereby making acquisitions more costly. There was no suggestion in the white paper that companies such as Facebook should be broken up which was the consequence of previous antitrust legislation against the Standard Oil Company in 1911 and the American Telephone and Telegraph (AT&T) Company in 1974. Since 1974 the antitrust environment in the USA has weakened, and following the passage of the US Telecommunications Act of 1996, four of the seven “Baby Bells,” which had been divested from AT&T following the antitrust legislation of 1974, have been recombined with AT&T. In addition, Exxon and Mobil Oil Corporations, formerly components of Standard Oil, recombined in 1999, arguing that there was now sufficient competition from foreign state-owned and state-backed companies in the OPEC cartel and in Russia. In recent years, despite the renewed pro-merger investment climate, many have suggested that the major social networking and tech sites should indeed be broken up. One way to achieve this in the case of Facebook would be to reverse their acquisitions of companies such as Instagram and WhatsApp. An alternative would be to break up the company into independent regional entities as was legislated in the case of AT&T producing the seven Baby Bells. It might be argued that even the current US antitrust laws could be used to facilitate the breakup of the major social networking monopolies and other tech companies and their insatiable acquisition history of the recent past might make this more feasible. However, as the social networking and related platforms become more dominant and politically influential, the door may be closing on these possibilities.

7

Quantitative Social Network Analytics

There are a great number of quantitative metrics that have been developed for SNA (Waters 2013; Zafarani et al. 2014). Those of most interest to regional science fall into two broad groups: measures of structure and visualization and procedures for content analysis.

Social Network Analysis

7.1

219

Measures of Structure and Visualization

SNA measures of structure include the following statistics developed in network science and various cognate disciplines: General Network Structure: the overall structure of the connections between nodes influences efficiencies of traversing the network, where traversals can represent information diffusion, the spread of disease, the growth of trade, and the development of transportation infrastructure. Two important models of network structure are small-world and scale-free networks. Small-world network models define networks based on their connectivity properties between random and regular connectivity. They mimic real-world social networks in that people are connected through others within a relatively low number of traversals through the network, in the geodesic sense. Scale-free network models focus on the attractiveness of nodes within a network. Members of a social network vary in their attractiveness whereby some amass more connections and can even dominate the connectivity within the network. In a scale-free network, there are few that have many connections and many that have few. In this model, superspreaders and influencers (see below) are responsible for a large amount of diffusion and activity across the network. Characteristics of Connections: homophily, the linking of nodes (people, groups, organizations) with others with similar socioeconomic characteristics such as age, gender, race, education, occupation, and political affiliation; determining the transitivity of friendships, that is, if A is a friend of B and B is a friend of C and C is a friend of A; and propinquity, measuring how friendships decline with distance. It is these measures, among others, that can determine how fast “fake news” will spread through a network and can be used for microtargeting. Measures of Node Distributions: structural holes separate a network into disconnected parts or subgraphs with no connecting link; a related concept is a bridge or weak tie that connects two parts of a network that would not be connected otherwise; tie strength is some measure of the importance of ties and is related to homophily and possibly propinquity (although distance may be measured in many ways (Waters 2013; Zafarani et al. 2014)); and average distance is links between two nodes and the diameter (i.e., the shortest path between the two most widely separated nodes in the network). Structural holes result in disconnected networks with subgraphs that are distinct from each other and that in social networks are often referred to as echo chambers. Segmentation: includes the identification of groups such as cliques (all nodes are connected to each other) or social circles where the nodes are connected but with some less complete connections and clustering coefficients (local or global) based on the ratio of closed triplets (three nodes fully connected by three links/ties) to all triplets (closed plus open, the latter connected only by two links).

220

7.2

R. Medina and N. Waters

Procedures for Content Analysis

Huang (2018) has suggested a five-step process for SM/SN analytics: data collection; data storage; text processing and mining; spatial/spatiotemporal analysis and data mining; and, finally, geovisualization. Data collection, or harvesting as it is frequently called, can involve manual methods, application programming interfaces (APIs), or innovative methods that carry out geospatial searches for specific terms or events such as floods and hurricanes in a disaster management study. The SM/SN databases, which are usually massive and unstructured text files from Web 2.0 sources with user-generated content, are normally required to be stored in NoSQL database structures as opposed to relational, SQL databases commonly used in traditional GIS studies. The NoSQL databases come with a variety of data models including wide column, document, and graph data models and some such as MarkLogic permitting both document and graph formats. These data models are particularly useful for SM/SN data storage. Zafarani et al. (2014) provide a comprehensive discussion of text processing and data mining approaches for SM/SN including sentiment analysis (e.g., like/dislike, particularly important in elections); community analysis (detection, growth/evolution, and evaluation); spammer detection (that may involve supervised and unsupervised algorithms and other data mining procedures to detect anomalies and patterns associated with previously identified spam); event detection; measuring influence and influencers; detecting homophily; and detecting, determining, and mitigating user/node vulnerability among others. Machine learning for statistical approaches to natural language processing is another important approach to text analysis within SM/SN analytics. Zafarani et al. (2014) recognize that groups and communities on SM/SN sites do not need to share geographical location and may be bound together for other reasons than geographical proximity since they only meet in the computer-mediated, network space. The fourth step of Huang’s five-step sequence to SM/SN analytics is spatial and spatial/temporal analysis (see Huang 2018). This may take many forms. For example, the Google-owned subsidiary, Waze, will provide GPS-based navigation information to drivers. This navigation information is provided by users (the “users are the labor and the product”), and so it is available in any country as long as it has a sufficient user base. The information available within the Waze system is essentially volunteered geographic information (VGI) on best routes, traffic conditions, weather conditions, accidents, points of interest, landmarks, restaurants, gas stations, and other types of businesses. Waze has partnered with radio stations, streaming music suppliers such as Spotify, and ebook supplier, Scribd, among others to enrich the user’s experience. Its acquisition by Google triggered preliminary investigations into anticompetitive behavior in the USA, UK, and Israel (where it was developed), but these investigations were not pursued. Visualization of social network structures, the fifth and final step in Huang’s methodology, has been facilitated by sophisticated software packages. These fall into three primary categories: first, packages with sophisticated graphical user interfaces such as UCINet, Pajek (Waters 2013), Social Network Visualizer, and Gephi (the

Social Network Analysis

221

second, third, and fourth can be freely downloaded); second, those that interface with scripting languages such as Python (NetMiner) and MuxViz (based on R and useful for networks where nodes have multiple/layered relations such as in Facebook where work and home networks and social interests may be distinct); and, finally, the Statnet collection of packages based on the R programming language.

8

How Value Is Generated with a Social Network

If the product is free, and joining a social network is almost always free, then those who build and invest in a social network must develop a sound business model to monetize the network. This is of interest to the network’s shareholders, but the ability to influence economic behavior and where that influence is located and emerges in the real world is also of primary interest to the regional scientist. Not only will this influence grow economic activity such as purchasing behavior in specific locations, but it will also disrupt traditional industries in those same locations. These disruptions will affect traditional advertising in the affected locations and bricks-andmortar retailing operations. For instance, influencing products in the cosmetic industry being marketed in online social networks will affect all the stores located in an area that sells those products. Those viewing cosmetic products on a social network such as YouTube will tend to buy more of those products online than in their neighborhood. Startups wishing to have persons known as “influencers” can promote their products on Instagram and YouTube and may have to pay these individuals tens of thousands of dollars to develop promotional videos. The cost of purchasing such video endorsements rises with the number of “followers” that an influencer has. A “super-influencer” or superspreader may have more than a million followers. Influencers themselves may have agents or management teams that find the businesses seeking endorsements. Other influencers will use websites and apps that function as digital marketplaces. Although the marketed products can still be bought offline, an entirely new industry has supplanted retailing from bricks-andmortar locations. Two, well-established digital markets are FameBit and Grapevine. FameBit was bought by YouTube in 2016. By mid-November 2018, YouTube, through FameBit, had partnered with over 9,000 brands and a great many additional influencers. All of which generates new revenue for the social network. Influencers can be paid to disparage competitors’ products and can “game the system” by purchasing likes and followers from companies such as TagScout that market such services. In November 2018, Instagram, owned by Facebook, said that it would use machine learning algorithms to boost its existing techniques for detecting accounts that artificially grow their likes, followers, and comments. These new procedures cannot be used retroactively to remove followers from users that have already manipulated the system. US Federal Trade Commission guidelines state that influencers have to disclose their relationship to any business when they endorse their product. Social network companies are secretive about the algorithms that govern their systems, but the academic literature provides many discussions of these procedures.

222

R. Medina and N. Waters

For example, Li et al. (2018) present an exhaustive review of influence maximization (IM) on social networks. Their study considers viral marketing, the diffusion of the adoption of a new product from some initial adopters through a social network. Five models were investigated: (a) An independent cascade (IC) model where a user is influenced independently by each of his or her neighbors. Li et al. (2018) review a number of methods for estimating these probabilities. (b) A linear threshold (LT) model where an inactive user becomes an adopter once a certain threshold of his or her neighbors has been reached. (c) A triggering model generalizes the IC and LT models which are special cases of it. Thus an inactive user becomes an adopter when a subset of his or her neighbors reaches a certain probability. (d) The first three models are time-unaware models but a propagation campaign (especially important in political contexts such as the 2016 UK Brexit Referendum and the 2016 US presidential election are often time critical). Time-aware diffusion models can be formulated in discrete-time steps (as IC or LT models) or continuous-time models where the likelihood of propagation between adopters and non-adopters is a continuous function of time. (e) In nonprogressive models, an adopter can become a non-adopter, and this is used in voting preference models. Li et al. (2018) discuss ways of simulating influence using Monte Carlo simulations. Such algorithms do not scale for large graphs or networks, and so they propose a series of proxies for ranking influence such as node degree (i.e., the number of links/connections to a user), Google’s PageRank algorithm, or network measures such as distance centrality, reviewed above. Other algorithms that speed up and make scalable the simulation approaches break the network into pieces or subgraphs where structural holes or weak ties may be used to split up the network. Finally, the context-aware group of algorithms includes topic-aware IM algorithms that include measures of user benefit. Time-aware IM algorithms impose a time constraint as a stopping mechanism. Location-aware IM algorithms maximize the influence of location-significant users, and this is appropriate for influence peddling in political campaigns and, of course, with users of social networks such as Twitter, Foursquare, and especially Facebook. They might also be referred to as distanceaware algorithms since either approach might be useful in targeting users within a political district. Dynamic IM algorithms would be appropriate in a social network where influence evolves which again is how the electioneering works in a political campaign. The last type of context-aware algorithm is the competitive IM algorithm which would be appropriate in marketing but again particularly appropriate where you have two or more competing political parties. Indeed reportedly one fake news strategy in the 2016 presidential campaign was to seed extreme political views of both Republican and Democratic parties within a well-defined location. In suggesting future challenges in the design of IM algorithms, Li et al. (2018) suggest not only including the social network structure of links between friends but also

Social Network Analysis

223

modeling the importance of group norms. This might include past affiliations and membership in clubs and the traditional socioeconomic status variables such as gender, age, income, religion, and ethnicity, but to be effective they would have to be easily identifiable by data mining algorithms within the social network.

8.1

Microtargeting and Fake News

Arguably one of the most successful and lucrative commercial applications of geographic information systems (GIS) technology has been geodemographic marketing (also known as market segmentation, target marketing, and geo-clustering). A pioneer in this technology was Claritas Corporation which developed its PRIZM system for profiling consumers in the USA. In the 1980s, Claritas was able to use the US TIGER Census files to divide US zip codes into 40 socioeconomic profiles. More recently these were increased to 62 profiles. This process of market segmentation using geodemographics was also replicated in other countries such as the UK where CACI Ltd. pioneered the Acorn technology (1.9 million postcodes were split into 6 categories, 18 groups, and 62 types). In Canada similar geodemographic profiling and marketing segmentation systems are now implemented by the Environics firm. In addition to geodemographic data, the Environics methodology uses psychographic data (based on various personality characteristics) and also recreational interests, lifestyles, values, and attitudes. These are used to produce 68 lifestyle types. In a separate analysis, customers have been segmented into 150 geodemographic types by 6-digit postal codes. The Mosaic system for target marketing is used in many countries in Europe and indeed around the globe. Segmentation on the basis of behavioral, lifestyle, social, and personality characteristics has all been used, but segmentation that includes geography has many advantages because the data is usually available from national censuses (today often at no cost) and the “targets” are spatially concentrated, a significant benefit in commercial marketing and political advertising and regional science research. If the USA is divided into 62 profiles, the UK into 62 types, and Canada 68, the segmentation is hardly a very sophisticated classification, and there may be considerable variation within a given zip code or postal code. More recently, with respect to social networks, these criticisms have led to the development of far more sophisticated targeting called microtargeting (Metasas and Mustafaraj 2012) and, less commonly, nano-targeting. The George W. Bush presidential campaign of 2004 was a pioneer of microtargeting. In that election campaign, Republican strategists claimed that their microtargeting of Black voters with anti-gay marriage ads in the so-called battleground state of Ohio allowed them to boost the percentage of the Black vote from 9 percent to 16 percent. Microtargeting the Black vote was probably not a deciding factor in Bush’s win, but it is likely that his margin of victory would have been extremely low had they not used these ads. In the 2012 US presidential election, the Republicans were once again beaten by the Democratic campaign’s use of social media. The segments of society that made the difference were the young women, Latinos, African Americans, and Asian Americans. These communities are

224

R. Medina and N. Waters

credited with helping Barack Obama defeat Mitt Romney in part because they were faster adopters of social networking and mobile media. Metrics relating to social networks have been used to explain why Obama won including the fact that he had a network of 1.2 million followers on Facebook, about twice as many as Romney. Obama also had twice as many YouTube likes, comments, and views and more than twice as many retweets. The key influence of social networks and other media was the identification of like-minded voters. The Obama campaign used four scores (similar to credit scores) to assess voting behavior in the so-called “swing states”: likelihood of supporting Obama, likelihood of showing up to the polls, likelihood of responding to campaign reminders, and likelihood of being persuaded by a single issue to change their minds and vote for Obama. Did the Republicans learn from their failure in using social networks and SNA in order to win the 2016 election? Absolutely. The digital director for the 2016 Republican presidential campaign, Brad Parscale, who tested up to 50,000 ads daily on Facebook believed that it was this strategy of microtargeting voters that helped the Republicans win the White House in 2016. Facebook provided the Republican campaign with employees who helped train the campaign workers in how to design and implement the ads. Variations of design, color, phrasing, and background of the ads were constantly tested for maximum effectiveness and targeted at Facebook users in groups as small as 15 who would never have been the focus of a traditional media campaign. Lists of registered voters from public records and Republican databases were matched to Facebook users. In 2011 Facebook had its campaign ads classified as minor items similar to campaign buttons that did not require reporting to the Federal Elections Commission. This decision is now under review, which is not surprising since Parscale’s firm was paid $94 million by the conclusion of the successful Republican campaign.

9

Analyzing Fake News on Social Networks: Regional Science Implications

There are many synonyms for fake news (FN) and concepts closely related: posttruth, post-factual, alternative facts, the big lie, propaganda, and spin. Fake news is not just a problem of the twenty-first century. It was formerly known as propaganda and disinformation, and Taylor (2018) has provided a 500-year history that spans the phenomena from the time of Henry VIII, the second Tudor king of England, to the present-day crisis within SN. Taylor discusses the most notorious of the conspiracy theories that might be characterized as “big lies” that plagued the 2016 US presidential election and spread virally throughout social networks such as Facebook and Twitter. A recent data dump by Twitter unveils tweets from 3,841 accounts believed to originate from a Russian state-backed influence operation against the USA and 770 accounts from a similar campaign in Iran. Based on the sheer size of SM/SN companies and vulnerability in the public sphere, identifying and stopping disinformation operations before it is too late may prove to be extremely difficult.

Social Network Analysis

225

Equally insidious are a blizzard of little lies that have a cumulative effect of destroying the credibility of any political discourse. President Trump, for example, tweets an average of six messages a day and has over 68 million followers on his two Twitter accounts, and over 24 million accounts receive his Facebook posts, and so his messages reach over 92 million people (Taylor 2018, 337) immediately and without cost. The implications for political advertising are overwhelming. In 2018 the British Broadcasting Corporation (BBC) published two reports and a series on online articles that summarized their fake news (FN) research in India, Kenya, Nigeria, Brazil, Iran, and Mexico. This research describes the primary methods and platforms for distributing and amplifying FN. The BBC reports have shown that there are distinct regional patterns to the consumption and spread of FN, differences that must be of interest and concern to the regional scientist. Initially concern over FN was focused on Facebook and its data breach exploited by Cambridge Analytica and the impact of the ensuing meddling in the 2016 US presidential election and the UK’s Brexit Referendum. More recently attention has shifted toward Facebook’s wholly owned subsidiary, WhatsApp. This is because communications on WhatsApp are encrypted and this makes it difficult to ascertain the extent of the problem and to determine how the information spreads through the network.

10

Social Networks and Social Media for Illegal Activities

Social network and social media technologies are being used widely for illegal activities. A relatively recent example of this is the use of multiple digital communication tools by ISIS to distribute propaganda, gain support (both material and emotional), attract and recruit followers, plan and carry out attacks, and threaten violence, among other uses. Other terrorist organizations have adopted the same online strategies. ISIS accounts and information could be found on many popular social network sites including Facebook, Twitter, Instagram, and YouTube. By using SN rather than traditional media, terrorist groups are able to better control their message, which helps them define themselves and their activities. Among communication tools, some offer encryption, which is attractive to those involved in criminal activities. While the use of popular SM/SN sites is valuable, because a wide audience can quickly be reached, those sites can prove detrimental to planning, strategy, and carrying out of physical operations. Using non-encrypted technologies can also be detrimental to criminal activities. Terrorists have been using apps such as Telegram, which promotes privacy and secrecy through encryption, secret rooms, and self-destructing messages. Applications such as these can serve as the origin for information that ultimately makes its way out to other SM/SN applications for wider distribution. For those that need a much freer, safe space to roam, the Dark Web is available for difficult, if not impossible, to track activities. This can include the sale and purchase of weapons, drugs, and other illegal exchanges, access to message boards, and terrorist and criminal organizations, as well as perfectly legal activities one

226

R. Medina and N. Waters

would find on the surface web. Some websites have moved to, or operate other instances on, the Dark Web for operational privacy and greatly increased anonymity. Social networks and social media exist there; however, it is difficult to know to what extent, as, at present, no engine exists to catalog the sites, as they do on the surface web.

11

Conclusion

When social network companies acquire other companies, close them down, and consume their talent, they are having an economic impact and changing the economic landscape. In the USA, the acquisitions by Facebook and Google have had an impact across the country but primarily in the San Francisco Bay region. The success of these social network companies has also had an impact directly on traditional media, especially the newspaper industry. Many people now get their news from social media networks that supply newsfeeds curated to their socioeconomic and political profiles. This can be an issue for those network members that are tuned in to sources of information that are providing “fake news,” “alternate facts,” and other forms of information that can introduce or promote extreme ideas and sentiment. Our lives and behavior are now so influenced by the Internet’s information flows that we must give attention to our connections within it. Good and bad, truths and untruth, all flow more efficiently today than ever before. We are busy enjoying the benefits of hyper connectivity and do not pay adequate time contemplating the negative effects on individuals and society. Have we become too connected? Victims of inescapable bullying may say so. Does maintaining online social networks limit our motivation and ability to reach out and meet new people and/or consider new ideas? It is theorized that there is a cognitive limit to the number of social relationships that humans can handle. While this limit varies based on brain’s processing capacity, the average number of social connections one can maintain is commonly accepted to be 150. This number is referred to as the Dunbar number, named after Robin Dunbar, who initially found correlations between brain size and social group size for nonhuman primates and, from his calculations, extrapolated this figure to account for average human brain size. Given that social networks may be limited and people are being targeted for the distribution of all types of information, our field of view may also be limited or, worse, designed. What’s to come for the future of SM/SN? One can anticipate that when Facebook bought the virtual reality company Oculus VR, they would attempt to make online social connections more real. The Internet, and the social networks it facilitates, do keep people connected at any distance and, by doing so, weaken the influence of the friction of distance in personal relationships, even though we know that when we interact, something is missing. Many newer social networking applications, and updates to old ones, include ways to meet in person. A way to mimic face-to-face interaction may be through VR technologies that have not been fully developed todate. Given a proper sensor and a VR headset, it may be possible in the near future to meet with friends in a virtual space that looks and feels real enough to pass as a

Social Network Analysis

227

physical place. Given the shortcomings of this potential technology, it still provides an improvement over commenting on posts from people that, at one (or many) point(s), were coincident in space and time. In short, the most important innovations that SM/SN companies can make may be those that make us feel human again.

12

Cross-References

▶ Clusters, Local Districts, and Innovative Milieux ▶ Geographic Information Science ▶ Geovisualization ▶ Knowledge Flows, Knowledge Externalities, Development ▶ Networks in the Innovation Process ▶ Spatial Network Analysis ▶ Spatio-temporal Data Mining ▶ The Geography of R&D Collaboration Networks

and

Regional

Economic

References Alhajj R, Rokne J (eds) (2018) Encyclopedia of social network analysis and mining, 2nd edn. Springer, New York. first edition published 2014 Barnes TJ (2003) What's wrong with American regional science? A view from science studies. Can J Reg Sci 26(1):3–26 Boyd D, Ellison NB (2007) Social network sites: definition, history and scholarship. J Comp Mediat Comm 13(1):210–230 Eggers D (2013) The circle. Random House, New York Huang Q (2018) Social media analytics. In: Wilson JP (ed) The geographic information science & technology body of knowledge (1st Quarter 2018 Edition). https://doi.org/10.22224/gistbok/ 2018.1.10 Kitchin R (2017) Leveraging finance and producing capital. In: Kitchin R, Lauriault TP, Wilson MW (eds) Understanding spatial media. Sage, London, pp 178–187 Levin S (2018) Is Facebook a publisher? In public it says no, but in court it says yes. https://www.theguardian.com/technology/2018/jul/02/facebook-mark-zuckerberg-platform-pub lisher-lawsuit. Accessed 4 Nov 2018 Li Y, Fan J, Wang Y, Tan KL (2018) Influence maximization on social graphs: a survey. IEEE Transactions on Knowledge and Data Engineering. https://ink.library.smu.edu.sg/cgi/ viewcontent.cgi?referer=https://scholar.google.ca/&httpsredir=1&article=4983&context=sis _research. Accessed 19 Nov 2018 McLuhan M (2006) The medium is the message. In: Media and cultural studies: keyworks, revised edn. Blackwell, Oxford, pp 107–116 Metaxas PT, Mustafaraj E (2012) Social media and the elections. Science 338(6106):472–473 Taylor DJ (2018) Fayke Newes. The History Press, Brimscombe Port Waters NM (2013) Chapter 38, Social network analysis. In: Fischer MM, Nijkamp P (eds) Handbook of regional science. Springer, Heidelberg, pp 725–740 Zafarani R, Abbasi MA, Liu H (2014) Social media mining: an introduction. Cambridge University Press, Cambridge. http://dmml.asu.edu/smm/SMM.pdf. Accessed 15 Nov 2018

228

R. Medina and N. Waters

Further Reading Arthur WB (1994) Increasing returns and path dependence. The University of Michigan Press, Ann Arbor Barabási AL (2003) Linked: the new science of networks. Perseus Books Group, New York Beckett L (2012) Everything we know (So Far) about Obama’s big data tactics. https://www. propublica.org/article/everything-we-know-so-far-about-obamas-big-data-operation. Accessed 3 Nov 2018 Boyd D (2018) Amplification and responsible journalism. https://points.datasociety.net/media-ma nipulation-strategic-amplification-and-responsible-journalism-95f4d611f462. Accessed 17 Nov 2018 Cervone G, Sava E, Huang Q, Schnebele E, Harrison J, Waters NM (2016) Using Twitter for tasking remote-sensing data collection and damage assessment: 2013 Boulder flood case study. Int J Remote Sens 37(1):100–124. https://doi.org/10.1080/01431161.2015.1117684 Dunbar RI (1993) Coevolution of neocortical size, group size and language in humans. Behav Brain Sci 16(4):681–694 Edsall TB (2012) Let the Nanotargeting begin. https://campaignstops.blogs.nytimes.com/2012/04/ 15/let-the-nanotargeting-begin/. Accessed 21 Nov 2018 Handley L (2017) Trump’s digital campaign director was paid $1,500 to set up his election website. Then he raked in $94 million when Trump won. https://www.cnbc.com/2017/10/09/donaldtrumps-digital-campaign-directors-company-paid-94-million.html. Accessed 21 Nov 2018 Martinez AG (2016) Chaos monkeys: obscene fortune and random failure in Silicon Valley. HarperCollins, New York Mosaic (2010) Mosaic global e-handbook. http://www.appliedgeographic.com/AGS_2010%20web %20pdf%20files/Mosaic%20Global%20E-Handbook.pdf. Accessed 3 Nov 2018 Parker R (2012) Social and anti-social media. https://campaignstops.blogs.nytimes.com/2012/11/ 15/social-and-anti-social-media/. Accessed 21 Nov 2018 Parscale B (2017) Trump digital director says Facebook helped win the White House. https://www.theguardian.com/technology/2017/oct/08/trump-digital-director-brad-parscale-fac ebook-advertising. Accessed 3 Nov 2018 Pew Research Center (2018a) Social media: fact sheet. http://www.pewinternet.org/fact-sheet/ social-media/. Accessed 14 Nov 2018 Pew Research Center (2018b) Journalism and media: newspapers fact sheet. http://www.journalism. org/fact-sheet/newspapers/. Accessed 12 Nov 2018 Verge (2018) The monopoly-busting case against Google, Amazon, Uber, and Facebook. What tech companies have to fear from antitrust law. https://www.theverge.com/2018/9/5/17805162/ monopoly-antitrust-regulation-google-amazon-uber-facebook. Accessed 21 Nov 2018 Warner MR (2018) Potential policy proposals for regulation of social media and technology firms. https://graphics.axios.com/pdf/PlatformPolicyPaper.pdf. Accessed 16 Nov 2018 Waters NM (2018) Tobler’s first law of geography. In: Richardson D, Castree N, Goodchild MF, Kobayashi A, Liu W, Marston R (eds) International encyclopedia of geography: people, the earth, environment, and technology. Wiley, New York. https://doi.org/10.1002/ 9781118786352.wbieg1011 Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393 (6684):440 Weiss MJ (2000) The clustered world. Little, Brown, and Company, New York Wikipedia (2018) Most popular social networks. https://en.wikipedia.org/wiki/Social_ media#Most_popular_social_networks. Accessed 4 Nov 2018 Wu T (2019) The curse of bigness: antitrust in the new gilded age. Columbia Global Reports, Columbia University, New York

Land-Use Transport Interaction Models Michael Wegener

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Operational Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Spatial Interaction Location Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Accessibility-Based Location Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Current Debates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Equilibrium or Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Macro or Micro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Future Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

230 231 234 236 236 238 238 240 241 243 244

Abstract

The relationship between urban development and transport is not simple and one way but complex and two way and is closely linked to other urban processes, such as macroeconomic development, interregional migration, demography, household formation, and technological innovation. In this chapter, one segment of this complex relationship is discussed: the two-way interaction between urban land use and transport within urban regions. The chapter looks at integrated models of urban land use and transport, i.e., models that explicitly model the two-way interaction between land use and transport to forecast the likely impacts of land use and transport policies for decision support in urban planning. The discussion starts with a review of the main theories of land-use transport interaction from transport planning, urban economics, and social geography. It then gives a brief overview of selected current operational urban models, thereby distinguishing M. Wegener (*) Spiekermann and Wegener, Urban and Regional Research, Dortmund, Germany e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_41

229

230

M. Wegener

between spatial-interaction location models and accessibility-based location models, and discusses their advantages and problems. Next, it reports on two important current debates about model design: are equilibrium models or dynamic models preferable, and what is the most appropriate level of spatial resolution and substantive disaggregation? This chapter closes with a reflection of new challenges for integrated urban models likely to come up in the future. Keywords

Urban land-use transport interaction · Land use and transport policies · Equilibrium vs. dynamic models · Macro vs. micro models

1

Introduction

The history of urban settlements is closely linked to transport. Cities appeared in human history when technological innovation required the spatial division of labor between specialized crafts and agricultural labor and gave rise to urban–rural travel and goods transport. Cities were established at trade routes, ports, or river crossings and became origins and destinations of trade flows. Cities were compact, as all movements were done on foot, until the railway and later the automobile opened the way to today’s sprawling agglomerations. These brief notes already show that the relationship between urban development and transport is not simple and one way but complex and two way. On the one hand, spatial division of labor, i.e., the separation of locations of human activities in space, requires spatial interaction, i.e., travel and goods transport. On the other hand, the availability of transport infrastructure, such as roads, railways, and airlines, makes locations attractive as residences or business locations and so affects real estate markets and the choice of location of households and firms. Moreover, it becomes clear that the relationship between urban development and transport is closely linked to other urban processes, such as macroeconomic development, interregional migration, demography and household formation, and technological innovation. In this chapter, one segment of the complex relationship between urban development and transport is discussed: the two-way interaction between urban land use and transport within urban regions. The macroeconomic dimension dealing with growth or decline of whole cities within urban systems is addressed in several other chapters, such as chapters ▶ “Interregional Input-Output Models” and ▶ “Interregional Trade: Models and Analyses”. This chapter looks at integrated models of urban land use and transport, i.e., models which explicitly model the two-way interaction between land use and transport to forecast the likely impacts of land-use policies, such as zoning or building density or height constraints, and of transport policies, such as transport infrastructure investments, public transport improvements, or taxes or user charges, for decision support in urban planning. That excludes transport models per se which predict traffic patterns that result from different land-use configurations and land-use

Land-Use Transport Interaction Models

231

change models that predict likely land-use changes that result from a particular transport system, as well as models that deal only with one urban subsystem, such as housing or business location. The discussion proceeds from a review of the main theoretical approaches of landuse transport models and a brief overview of operational models to current debates and new challenges that are likely to influence future development in this field. There are in the literature several reviews of integrated land-use transport models, such as Wegener (2004) and Hunt et al. (2005).

2

Theory

Urban land-use transport models originated in the United States in the 1960s as part of the diffusion of operations research and systems theory into all fields of society. The first attempts to model the interaction between land use and transport were initiated by transport planners who felt that predicting future traffic flows without taking account of their impacts on location was inadequate. Hansen (1959) showed for Washington, DC, that locations with good accessibility had a higher chance of being developed, and at a higher density, than remote locations (“how accessibility shapes land use”). The recognition that mobility and location decisions co-determine each other and that therefore transport and land-use planning need to be coordinated led to the notion of the “land-use transport feedback cycle”. The set of relationships implied by this term can be summarized as follows (Wegener and Fürst 1999, see Fig. 1): • The distribution of land uses, such as residential, industrial, or commercial, over the urban area determines the locations of households and firms and so the locations of human activities such as living, working, shopping, education, and leisure. • The distribution of human activities in space requires spatial interactions or trips in the transport system to overcome the distance between the locations of activities. • These spatial interactions are based on decisions of travelers about car availability, number of trips, destination, mode, and route. They result in traffic flows and, in case of congestion, in increased travel times, trip lengths, and travel costs. • Travel times, trip lengths, and travel costs create opportunities for spatial interactions that can be measured as accessibility. • The spatial distribution of accessibility influences, among other attractiveness indicators, location decisions of investors and results in changes of the building stock by demolition, upgrading, or new construction. • These changes in building supply determine location and relocation decisions of households and firms and thus the distribution of activities in space. This simple explanation pattern is used in many engineering-based and humangeography urban development theories. These start from origins and destinations, such as workers and workplaces, and from these infer trip volumes that best

232

M. Wegener

Fig. 1 The land-use transport feedback cycle (Wegener and Fürst 1999, p. 6)

reproduce observed trip frequency distributions. It had already been observed by Ravenstein (1885) and Zipf (1949) that the frequency of human interactions, such as messages, trips, or migrations between two locations (cities or regions), is proportional to their size but inversely proportional to their distance. The analogy to the law of gravitation in physics is obvious. The gravity model was the first spatial interaction model. Its physical analogy has later been replaced by better founded formulations derived from statistical mechanics (Wilson 1967) or information theory (Snickars and Weibull 1977). Only later did it become possible (Anas 1983) to link it via random utility theory (Domencich and McFadden 1975) to psychological models of human decision behavior. From the spatial-interaction model, it is only a small step to its application as a location model. If it is possible to draw conclusions from the spatial distribution of human activities to the interactions between them, it must also be possible to identify the location of activities giving rise to a certain trip pattern. Wilson (1970) distinguishes four types of urban spatial interaction location models: unconstrained models, production-constrained models, attraction-constrained models, and doubly constrained models. Unconstrained models deal with households without fixed residence or workplace, production-constrained models with households looking for a job, and attraction-constrained models with households looking for a residence. The doubly constrained model is actually not a location model but the familiar transport model (see chapter ▶ “Travel Behavior and Travel Demand”). To give an example, the production-constrained spatial interaction model is written as follows:   T ij ¼ Ai Oi Dj exp βcij

(1)

Land-Use Transport Interaction Models

Ai ¼ 1=

233

X

  Dj exp βcij

(2)

j

  Dj exp βcij pij ¼ X   Dj exp βcij

(3)

j

where Tij are trips between zone i and zone j, Oi are trips generated by i and Dj trips attracted by j, and cij is the travel time or travel cost, or both, between i and j. The β is a parameter indicating the sensitivity to travel cost; because of its negative sign, more distant destinations are less likely to be selected. Ai is the so-called balancing factor ensuring that total trips equal Oi, and pij is the probability that a trip goes from i to j. A second set of theories focuses on the economic foundations of land use. A fundamental assumption of all spatial economic theories is that locations with good accessibility are more attractive and have a higher market value than peripheral locations. This assumption goes back to von Thünen (1826) and has since been varied and refined in many ways (see chapter ▶ “Classical Contributions: Von Thünen and Weber”). Probably the most influential example of the latter kind is the model of the urban land market by Alonso (1964). The basic assumption of the Alonso model is that firms and households choose that location at which their bid rent, i.e., the land price they are willing to pay, equals the asking rent of the landlord, so that the land market is in equilibrium. The bid rent of firms results from the cost structure of their production function, i.e., sales price minus production and transport costs plus profit divided by size of land. A firm having a higher added value per unit of land is therefore able to pay a higher price than a firm with less intensive land utilization, everything else being equal. So it is not surprising that, say, jewelers are found in the center, whereas trucking companies have their yards on the periphery. Alonso’s model has been the point of departure for a multitude of urban-economics model approaches. In more advanced variations of the model, restrictive assumptions, such as the monocentric city or perfect competition and complete information, have been relaxed (e.g., Anas 1982). A third group of theories used in land-use transport models are social theories. In social theories of urban development, the spatial development of cities is the result of individual or collective appropriation of space. Based on an adaptation of evolutionist thoughts from philosophy (Spencer) and biology (Darwin), the Chicago school of urban sociologists interpreted the city as a multispecies ecosystem, in which social and economic groups fight for ecological positions. Appropriation of space takes place in the form of immigration of different ethnic or income groups or tertiary activities into residential neighborhoods, and concepts of animal and plant ecology, such as “invasion,” “succession,” or “dominance,” are used to describe the phases of such displacement. Social geography theories go beyond the macro perspective of social ecology by referring to age-, gender-, or social-group specific activity patterns which lead to

234

M. Wegener

characteristic spatiotemporal behavior and hence to permanent localizations. Action space analyses (e.g., Chapin and Weiss 1968) identify the frequency of performance of activities reconstructed from daily space-time protocols as a function of distance to other activities and draw conclusions from this for the most probable allocation of housing, workplaces, shopping, and recreation facilities or, in other words, for the most likely level of spatial division of labor in cities. Hägerstrand (1970) made these ideas operational by the introduction of “time budgets,” in which individuals, according to their social role, income, and level of technology (e.g., car ownership), command action spaces of different size and duration subject to three types of constraints: (i) capacity constraints, i.e., personal, nonspatial restrictions on mobility, such as monetary budgets, time budgets, availability of transport modes, and ability to use them; (ii) coupling constraints, i.e., restrictions on the coupling of activities by location and time schedules of facilities and other individuals; and (iii) institutional constraints, i.e., restrictions of access to facilities by public or private regulations such as property, opening hours, entrance fees, or prices. Only locations within the action spaces can be considered as destinations or permanent locations. On the basis of Hägerstrand’s action space theory, Zahavi (1974) proposed the hypothesis that individuals in their daily mobility decisions do not, as the conventional theory of travel behavior assumes, minimize travel time or travel cost needed to perform a given set of activities but instead maximize activities or opportunities that can be reached within their travel time and money budgets.

3

Operational Models

Lowry’s (1964) Model of Metropolis was the first attempt to quantify the land-use transport feedback cycle in one integrated model. The model consists of two singly constrained spatial-interaction location models, a residential location model and a service and retail employment location model, nested into each other. In modern notation, the two models would be written as   Ri exp βcij T ij ¼ X   Ej Ri exp βcij

(4)

i

  W j exp βcij S ij ¼ X   Pi W j exp βcij

(5)

i

where Tij are work trips between residential zone i and work zone j and Sij shopping trips between residential zone i to retail facilities in zone j. Ej are workers in j and Pi population in i to be distributed, and Ri are dwellings in i and Wj shopping facilities

Land-Use Transport Interaction Models

235

in j used as destinations in the two spatial interaction models, and cij is the travel time between i and j. In the first iteration, only work trips to the workplaces of basic industries, i.e., industries exporting to other regions and not serving the local population, are modeled. The two spatial interaction location models are linked by assumptions about how many people are supported by one worker and how many retail employees are supported by one resident. In each subsequent iteration, workers and residents are updated until they no longer change, i.e., until the system is in equilibrium. The Lowry model stimulated a large number of increasingly complex land-use transport models in the USA and not much later also in Europe. Many of these early models were not successful because of unexpected difficulties of data collection and calibration and the still imperfect computer technology of the time. More important, however, was that the models were mainly oriented toward urban growth and the efficiency of the transport system and had nothing to say about the ethnic and social conflicts arising in US cities at that time. Moreover, the models were committed to the paradigm of synoptic rationalism in planning theory, which was increasingly replaced by incremental, participatory forms of planning. In his “Requiem for Large Scale Models,” Lee (1973) accused the models of “seven sins”: hypercomprehensiveness, grossness, mechanicalness, expensiveness, hungriness, wrongheadedness, and complicatedness. But many of the technical problems of the early models were solved by better data availability and faster computers. The spatial and substantial resolution of the models was increased, and they were based on better theories, such as bid-rent theory, discrete choice theory, and user equilibrium in transport networks (see chapter ▶ “Network Equilibrium Models for Urban Transport”). In addition, better visualization techniques made the results of the models better understood by citizens and policy makers. A new generation of models paid more attention to aspects of social equity. The 1990s brought a revival in the development of urban land-use transport models. New environmental legislation in the USA required that cities applying for federal funds for transport investments demonstrate the likely impacts of their projects on land use. This had the effect that virtually all major metropolitan areas in the USA maintained an integrated land-use transport model. In Europe, the European Commission initiated a large research program The City of Tomorrow, in which integrated land-use transport models were applied in several research projects (Marshall and Banister 2007). Several integrated land-use transport models were applied in a growing number of metropolitan areas. New developments in data availability brought about by geographical information systems (GIS) and further advances in computer technology have removed former technical barriers. It is impossible to present here all operational integrated land-use transport models existing in the world today. Instead a classification of models by the way they implement the feedback from transport to land use is proposed using a few examples, recognizing that in each group, there exists a great variety of approaches.

236

3.1

M. Wegener

Spatial Interaction Location Models

Spatial interaction locations models retain the original Lowry concept by modeling the location of human activities as destinations of trips using the productionconstrained spatial interaction model. The most prominent urban model of this kind was the MEPLAN model developed by Echenique (1985). Current examples of operational models are TRANUS (de la Barra 1989) and PECAS (Hunt and Abraham 2005). These models use a multi-industry, multiregional input–output framework (see chapter ▶ “Interregional Input-Output Models”) to predict the locations of production and consumption in the urban region, where households of different types are treated as industries producing labor and consuming commodities. By iterating between the land-use parts and the transport parts of the models, general equilibrium between transport costs (including congestion) and land and commodity prices is achieved. The core equation of MEPLAN is X irs ¼ X ir Air f ðcir þ girs Þ Z is

(6)

where Xirs are deliveries of industry i from region r to region s, Xir is the supply of goods of industry i in r and Zir the demand for such products in s, and cir are unit production costs of such products in r and girs their unit transport costs from r to s. Air is the balancing factor as in Eq. (1) ensuring that total trade flows from region r equal production in r. The great advantage of spatial-interaction location models is their firm foundation in economic theory with respect to production and consumption. One possible criticism is that households are treated as industries producing labor and consuming commodities, with the consequence that residential location solely depends on workplace location, as if workers decided where to live on their way back from work. In his most recent model RELU-TRAN, Anas reverses the causal direction of the input–output framework by modeling the location choice of consumers (households), producers (firms), landlords, and developers separately by utility functions for households which include the costs of budget-constrained trips and production functions for firms including interindustry links as generated by the transport part of the model. As in the input–output models, by iterating between the land-use and transport parts of the model, general equilibrium between land use and transport is achieved (Anas and Liu 2007). In PECAS the spatial interaction part represents a short-term equilibrium and is complemented by a parcel-based model of actions of developers (Hunt and Abraham 2005).

3.2

Accessibility-Based Location Models

The second group of land-use transport models predicts not actual spatial interactions but the opportunity for spatial interactions at potential locations. The indicator of opportunity for spatial interactions is called accessibility. Accessibility indicators

Land-Use Transport Interaction Models

237

can take a wide range of forms, from simple accessibility indicators, such as distance to the nearest bus station or motorway exit, to complex indicators measuring the ease of reaching all destinations of interest. The most frequently used complex accessibility indicator is potential accessibility or the total of all destinations of interest weighted by an inverse function of the effort to reach them measured in time or cost or a combination of both as “generalized cost”: Ai ¼

X j

  Dj exp βcij

(7)

where Ai is the potential accessibility of zone i with respect to destinations of interest Dj and cij is the generalized costs of travel between i and j. The inverse similarity with the balancing factor of Eq. (2) is obvious. Examples of operational accessibility-based location models in use today are IRPUD (Wegener 1982, 2018), RURBAN (Miyamoto and Kitazume 1989; Miyamoto and Udomsri 1996, Miyamoto et al. 2007), MUSSA (Martinez 1996, 2018), DELTA (Simmonds 1999), and UrbanSim (Waddell 2002). These models predict location choices of households and firms with discrete choice models using multi-attribute utility functions in which accessibility indicators are combined with other attributes of potential locations to indicate their attractiveness from the point of view of households looking for a residential location or firms looking for a business location. In that respect, these models build on the bid-rent approach of Alonso (1964), although equilibrium between asking rents and bid rents on the land market is achieved only in RURBAN, MUSSA and DELTA, whereas IRPUD keeps land prices fixed during a simulation period and defer the price response of landlords to the next simulation period. As an example of accessibility-based location choice, the allocation of housing demand to vacant residential land by a multinomial logit model in the IRPUD model is shown (Wegener 2018): Lkli exp½ βk ukli ðt Þ C kli ðt, t þ 1Þ ¼ X Lkli exp½ βk ukli ðt Þ

C k ðt, t þ 1Þ

(8)

il

where Ck(t, t + 1) are new dwellings of type k developers plan to build in the whole region between time t and t + 1, Ckli(t, t + 1) are dwellings of that type that will be built on land-use category l in zone i in that period, and Lkli is the capacity of vacant land for such dwellings given zoning and building density and height constraints. The parameters βk indicate the selectivity of developers with respect to the attractiveness ukli(t) of land-use category l in zone i for dwellings of housing type k: ukli ðt Þ ¼ ½uki ðt Þvk ½ukl ðt Þwk ½uðckli Þðt Þ1vk wk

(9)

where uki(t) is the attractiveness of zone i as a location for housing type k, ukl(t) is the attractiveness of land-use category l for housing type k, and u(ckli)(t) is the attractiveness of the land price of land use category l in zone i in relation to the expected rent or price of the dwelling. The vk, wk, and 1vkwk are multiplicative weights

238

M. Wegener

adding up to unity. The zonal attractiveness uki(t) is multi-attribute and contains, besides other indicators of neighborhood quality, one or more types of accessibility indicators. The advantage of accessibility-based location models is that by inserting different types of accessibility indicators into the utility functions of different types of locators, the great diversity of accessibility needs reflecting different lifestyles and preferences of households and different communication and transport needs of firms can be considered. Their disadvantage is that the actual travel and transport behavior, and hence actual travel times and transport cost, become known only in the next iteration of the associated transport model, but this may be acceptable because they change over time only gradually. The separation of the land-use and transport parts of the model by the accessibility interface makes it easier to develop custom-tailored submodels of the location behavior of individual groups of actors, such as households looking for a dwelling, landlords looking for a tenant, developers considering upgrading of their housing stock or looking for vacant land for new residential buildings, or firms looking for vacant floorspace or for land to build new floorspace. This has important implications for the software organization of the models. While spatial-interaction location models as described in the previous section tend to be “unified,” i.e., to consist of one single complex algorithm designed to achieve general equilibrium, the accessibility-based models described in this section tend to be “composite,” i.e., to consist of several interlinked modules each serving a specific purpose, modeling the behavior of a particular group of actors and using the accessibility indicators most appropriate for that. Several accessibility-based LUTI models, such as RURBAN, MUSSA, DELTA and UrbanSim, originally did not contain transport submodels but were linked to existing transport models, and recently transport submodels were developed for DELTA and UrbanSim.

4

Current Debates

The urban models sketched so far represent the main model types coexisting until the end of the 1990s. However, from then on, the urban modeling scene has become increasingly fragmented along two dividing lines. The first divide runs between equilibrium modeling approaches and models that attempt to capture the dynamics of urban processes. The second more recent divide runs between aggregate macroanalytic approaches and new microscopic agent-based models.

4.1

Equilibrium or Dynamics

The first urban models were static equilibrium models, such as the Lowry model which generated an “instant metropolis” at a point in time in the future. This tradition was maintained and is still strong in urban-economics models based on the notion that all markets, including urban housing, real estate, and transport markets, tend to

Land-Use Transport Interaction Models

239

move toward equilibrium between demand and supply and that therefore the equilibrium state is the most appropriate guidance for urban planning. In contrast to this view, a different movement in urban modeling has become more interested in the adjustment processes going on in cities that may lead to equilibrium but more frequently do not. The proponents of this movement, influenced by systems theory and complexity theory, argue that cities have evolved over a long time and display a strong inertia which resists sudden changes toward a desired optimum or equilibrium (see chapter ▶ “Spatial Dynamics and Space-Time Data Analysis”). Following this view, urban change processes can be classified as slow, medium speed, and fast (Wegener et al. 1986): • Slow Processes: Construction. Urban transport, communications, and utility networks are the most permanent elements of the physical structure of cities. The land-use distribution is equally stable; it changes only incrementally. Buildings have a life-span of up to 100 years and take several years from planning to completion. • Medium-Speed Processes: Economic, Demographic, and Technological Change. The most significant kind of economic change are changes in the number and sectoral composition of employment. Demographic changes affect population through births, ageing, and death and households through household formation and dissolution. Technological change affects all aspects of urban life, in particular transport and communication. These changes do not affect the physical structure of the city but the way it is used. • Fast Processes: Mobility. There are even more rapid processes that are planned and completed in less than a year’s time. They refer to the mobility of people, goods, and information within and between given buildings and communication facilities. These changes range from job relocations and residential moves to the daily pattern of trips and messages. The advocates of dynamic models argue that in order to make realistic forecasts, it is necessary to explicitly take account of the different speeds of processes. In particular, they criticize the implicit assumption of spatial interaction location models that households and firms are perfectly elastic in their location behavior and change to the equilibrium spatial configuration as if there were no transaction costs of moving. In contrast, dynamic urban models make the evolution of the urban system through time explicit. Early dynamic urban models (Harris and Wilson 1978; Allen et al. 1981) treated time as a continuum. Today the most common form are recursive or quasi-dynamic models in which the end state of one simulation period serves as the initial state of the subsequent period. The length of the simulation period, usually one year, is the implicit time lag of the model, as changes occurring in one simulation period affect other changes only in the next simulation period. By using results from earlier simulation periods, the modeler can implement longer delays and feedbacks. For instance, if it is assumed that it typically takes three years to plan and build a house, a delay of three years between residential investment decisions and the new dwellings appearing on the market would be appropriate.

240

M. Wegener

Similar delays between investment decision and completion allow to model the typical cycles of over- and undersupply of office space. Most current dynamic urban models are composite models, i.e., operate with a combination of custom-tailored submodels for different urban change processes. By selecting the sequence in which these submodels are processed during a simulation period, the modeler can give certain processes priority access to scarce resources. It is no coincidence that most dynamic land-use models are accessibility-based location models, i.e., use accessibility indicators as link between transport and land use and so take advantage of the possibility to select different types of accessibility for different types of development. Most existing equilibrium urban models, however, are unified, i.e., apply one algorithm to all its parts, such as spatial interaction location in the case of MEPLAN, TRANUS, and PECAS, or bid-rent location in the case of MUSSA, because they aim at general equilibrium between supply and demand, which is easier to achieve in a unified model. However, the growing success of dynamic or quasi-dynamic models has had its effects on equilibrium models. Some spatial-interaction location models, such as MEPLAN and PECAS, have been made recursive, i.e., they are processed not only for a distant target year but for years in between and have been complemented by developer submodels producing residential, commercial, and industrial floorspace that serve as constraints for the allocation of households and economic activity in the equilibration of the subsequent simulation period (Echenique et al. 2013; Jin et al. 2013).

4.2

Macro or Micro

The second major divide appearing in the urban modeling scene concerns the debate about the most appropriate level of spatial and substantive disaggregation. The first urban models were zone-based like the travel models of the time, as the data required by both types of models were available only for relatively large statistical areas. However, in the 1990s, the growth in computing power and the availability of GIS-based disaggregate data fuelled by non-modeling applications, such as data capture, mapping, spatial analysis, and visualization, has had its impact on urban modeling. New modeling techniques, such as cellular automata (CA) and agent-based models developed and applied in the environmental sciences, were proposed for modeling land-use changes of high-resolution grid cells (see chapter ▶ “Cellular Automata and Agent-Based Models”). In transport planning, activitybased models modeling no longer trips but activity-related multi-stop tours have become the state of the art (see chapter ▶ “Activity-Based Analysis”). The impact of these developments on urban modeling has been a massive and still continuing trend toward disaggregation to the individual level or microsimulation. There are important conceptual reasons for microsimulation, such as improved theories and growing knowledge about human cognition, preferences, behavior under uncertainty and constraints, and interactions between individuals in households, groups, and social networks (see chapter ▶ “Social Network Analysis”), a growing potential for individualization, the choice of diversified lifestyles and hence

Land-Use Transport Interaction Models

241

mobility and location patterns. Disaggregate models of individual behavior are better suited to capture this heterogeneity. Microsimulation was first used in the social sciences by Orcutt et al. (1961). Early applications with a spatial dimension covered a wide range of processes, such as spatial diffusion and urban expansion (see chapter ▶ “Spatial Microsimulation”). Since the 1980s, several microsimulation models of urban land use and transport have been developed, such as the pioneering ILUTE (Salvini and Miller 2005). Stimulated by the technical and conceptual advances discussed above, agent-based microsimulation urban models are proliferating all over the world, including microsimulation versions of originally aggregate models, such as IRPUD, DELTA, and UrbanSim. However, not all disaggregate urban modeling projects have been successful (see, for instance, Wagner and Wegener 2007; Nguyen-Luong 2008). Many large modeling projects had to reduce their too ambitious targets. The reasons for these failures are partly practical, such as large data requirements and long computing times, but partly also conceptual. The most important conceptual problem is the lack of stability of microsimulation models due to stochastic variation. Stochastic variation, also called microsimulation or Monte Carlo error, is the variation in model results between simulation runs with different random number seeds (see chapter ▶ “Spatial Microsimulation”). In agentbased models of choice behavior, the magnitude of stochastic variation is a function of the ratio between the number of choices and the number of alternatives and the selectivity of the choosing agents (the β parameter in the equations of this chapter). The stochastic variation is small when a large number of agents with clear preferences choose between few alternatives, e.g., travel modes. It is large when a small number of agents with less pronounced preferences choose between a large number of alternatives, e.g., locations, such as grid cells, parcels, or zones, as in the case of residential or business location. In that case, the stochastic noise may be larger than the differences between competing planning alternatives under investigation, and the results may convey an illusionary sense of precision (Wegener 2011). There are several ways to overcome this dilemma, such as averaging the results to a higher spatial level or to artificially increasing the number of choices in the model. The most frequently recommended method is to run the model several times and to average across the results of the different runs, something rarely done because of the already long computation times of microsimulation models. In conclusion, the microsimulation community has yet to find a proper answer to the stochastic variation problem. The optimum level of disaggregation may not be the most disaggregate one. What is needed is a theory of multilevel urban models to identify the appropriate level of conceptual, spatial, and temporal resolution for each modeling task.

5

Future Challenges

The world is changing fast, and so are the problems of urban planning. The first landuse transport models were growth-oriented and mainly addressed technical problems, such as the reduction of urban sprawl and traffic congestion. The second

242

M. Wegener

generation of models increasingly considered equity aspects, such as social and ethnic segregation, accessibility of public facilities, and distributive issues, such as who gains and who loses if certain policies are implemented. Today the third generation of models tries to take account of the observed individualization of lifestyles and preferences by ever greater spatial, temporal, and substantial disaggregation. However, today new challenges are becoming visible that cannot be handled by many of the urban land-use transport models existing today. The first challenge is to extend the models from land-use transport interaction models to land-use transport environment models. Today only few urban models are linked to environmental models to show the impacts of planning policies on greenhouse gas emissions, air quality, traffic noise, and open space (Lautso et al. 2004). As environmental submodels predicting air quality or noise propagation require highresolution grid cell data, this model extension may give a new twist to the macro versus micro debate toward multilevel models using different spatial levels with different resolutions and upward and downward feedbacks. Even fewer models are able to model the reverse relationship, the impact of environmental quality, such as air quality or traffic noise, on location. The second challenge is the transition from population growth to population decline already observed and foreseeable in many European cities. With small population decline and moderate economic growth, there is still demand for new housing because of decreasing household size and increasing floorspace per capita. The same is true for work places due to growing floorspace demand per worker. However, if the losses of population and employment become larger than the growth in floorspace demand per capita or per worker, the task is no longer the allocation of growth but the management of decline by new types of policies, such as rehabilitation of neighborhoods, upgrading of rundown housing, or conversion or demolition of derelict or vacant buildings. Only few current urban models are able to handle this. The third and greatest challenge arises from the possibility of future energy crises and the requirements of climate protection. Both causes are likely to make mobility significantly more expensive. For model design, it does not matter whether car trips become more expensive through higher prices of fossil fuels on the world market or through government policies to meet greenhouse gas reduction targets. What matters is that these targets cannot be achieved without rigorous changes in the framework conditions of land use and transport in urban areas, in particular without significant increases in the price of fossil fuels. Most current urban models are not prepared for this. Many of them are not able to model transport policies, such as carbon taxes, emissions trading, road pricing, or alternative vehicles and fuels, or land-use policies, such as strict development controls, improvement of the energy efficiency of buildings, or decentralized energy generation. Even fewer models are able to identify population groups or neighborhoods most affected by such policies or possible problems with access to basic services, such as schools or health facilities, or participation in social and cultural life in low-density suburban or rural areas.

Land-Use Transport Interaction Models

243

Many current transport models cannot correctly predict the impacts of substantial fuel price increases. Many do not consider travel costs in modeling car ownership, trip generation, trip distribution, and modal choice. Many do not forecast induced or suppressed trips. Many use price elasticities estimated in times of cheap energy. Many do not consider household budgets for housing and travel. Action space theory with explicit travel time and travel cost budgets permits to predict what will happen if speed and cost of travel are changed by environmentoriented planning policies. Acceleration and cost reduction in transport lead to more, faster, and longer trips; speed limits and higher costs to fewer, slower, and shorter trips. In the long run, this has effects on the spatial structure. Longer trips make more dispersed locations and a higher degree of spatial division of labor possible; shorter trips require a better spatial coordination of locations. However, making travel slower and more expensive does not necessarily lead to a reconcentration of land uses back to the historical city center. In many urban regions, population has already decentralized so much that further deconcentration of employment would be more effective in achieving shorter trips than reconcentration of population. That plausible forecasts of the impacts of substantial energy price increases can be made with land-use transport models based on action space theory was demonstrated by the results of the EU project Scenarios for the Transport System and Energy Supply and their Potential Effects (STEPs). They show that with appropriate combinations of transport and land-use policies, significant reductions in greenhouse gas emissions can be achieved without unacceptable loss of quality of life (Fiorello et al. 2006). More research on urban models and climate change is going on world-wide.

6

Conclusions

After half a century of development, there exists today a broad spectrum of mathematical models to predict the spatial evolution of cities subject to exogenous trends and land-use and transport policies. These models build on a range of theories from transport planning, urban economics, and social geography to explain the complex two-way interaction between urban land use and transport, i.e., the location of households and firms and the resulting mobility patterns in urban regions subject to concurrent economic, demographic, and technological developments. Stimulated by advances in data availability, theory development and computing technology, these models have reached an impressive level of sophistication and operational applicability. However, the urban modeling field has recently become divided into camps with different modeling philosophies. In particular, two dividing lines are becoming visible: One is the divide between equilibrium approaches which assume that cities are essentially markets moving toward equilibrium between demand and supply and dynamic approaches focusing on adjustment processes of different speeds. The other is the divide between macro approaches dealing with statistical aggregates at the level of zones and micro approaches modeling individual households and firms at the level of grid cells or parcels. In each of the two debates, the advantages and

244

M. Wegener

disadvantages of the competing approaches are obvious, but what is missing is an open and honest assessment of their relevance for the validity and robustness of the results of the models. Collaborative research projects in which different models are applied to identical problems and their results compared by meta-analyses are still the exception. A second issue regarding the future of urban models is the new challenges for urban planning. The growing importance of environmental impacts of land-use and transport policies has not yet fully been embraced by most urban models. Neither has the transition from population growth to population decline already observed or foreseeable in many cities, a great challenge for some models originally designed for allocating growth. But the greatest challenge for urban models will be how to cope with the combined effects of future energy scarcity and the imperatives of climate change. During and after the energy transition, energy for transport and building heating will no longer be abundant and cheap but scarce and expensive. This will have fundamental consequences for mobility and location. Land-use transport models which are calibrated on a purely statistical basis on behavior observed in times of cheap energy and do not consider the costs of travel and location in relation to household income cannot adequately forecast these consequences. To deal with significantly rising energy costs, land-use transport models must consider the basic needs of households which can be assumed to remain relatively constant over time, such as shelter and security at home, accessibility of work, education, retail and necessary services, and the constraints on housing and travel expenditures by disposable household incomes. To avoid the danger that the models, as in the 1970s, are again rejected by the planning practice, they must give up some long-standing traditions and be prepared to adopt new modeling principles: less extrapolation of past trends but more openness to fundamental change, less reliance on observed behavior but more theory on needs, less consideration of preferences and choices but more taking account of constraints, and less effort on detail but more focus on basic essentials.

References Allen PM, Sanglier M, Boon F (1981) Models of urban settlement and structure as self-organizing systems. US Department of Transportation, Washington, DC Alonso W (1964) Location and land use. Harvard University Press, Cambridge, MA Anas A (1982) Residential location models and urban transportation: economic theory, econometrics, and policy analysis with discrete choice models. Academic, New York Anas A (1983) Discrete choice theory, information theory and the multinomial logit and gravity models. Transp Res B 17(1):13–23 Anas A, Liu Y (2007) A regional economy, land use and transportation model (RELU-TRAN): formulation, algorithm design and testing. J Reg Sci 47(3):415–455 Chapin FS, Weiss SF (1968) A probabilistic model for residential growth. Transp Res 2(4):375–390 de la Barra T (1989) Integrated land use and transport modelling. Cambridge University Press, Cambridge Domencich TA, McFadden D (1975) Urban travel demand: a behavioral analysis. North Holland, Amsterdam

Land-Use Transport Interaction Models

245

Echenique MH (1985) The use of integrated land use transportation planning models: the cases of Sao Paulo, Brazil and Bilbao, Spain. In: Florian M (ed) The practice of transportation planning. Elsevier, The Hague, pp 263–286 Echenique MH, Grinevich V, Hargreaves AJ, Zachariadis V (2013) LUISA: a land-use interaction with social accounting model; presentation and enhanced calibration method. Environ Plann B 40:1003–1026 Fiorello D, Huismans G, López E, Marques C, Monzon A, Nuijten A, Steenberghen T, Wegener M, Zografos G (2006) Transport strategies under the scarcity of energy supply. STEPs final report. Buck Consultants International, The Hague Hägerstrand T (1970) What about people in regional science? Pap Reg Sci Assoc 24(1):7–21 Hansen WG (1959) How accessibility shapes land use. J Am Inst Plann 25(2):73–76 Harris B, Wilson AG (1978) Equilibrium values and dynamics of attractiveness terms in production-constrained spatial-interaction models. Environ Plann A 10(4):371–388 Hunt JD, Abraham JE (2005) Design and implementation of PECAS: a generalised system for the allocation of economic production, exchange and consumption quantities. In: Lee-Gosselin MEH, Doherty ST (eds) Integrated land-use and transportation models: behavioural foundations. Elsevier, St. Louis, pp 253–274 Hunt JD, Kriger DS, Miller EJ (2005) Current operational urban land-use transport modeling frameworks: a review. Transp Rev 25(3):329–376 Jin Y, Echenique M, Hargreaves A (2013) A recursive spatial equilibrium model for planning largescale urban change. Environ Plann B 40:1027–1050 Lautso K, Spiekermann K, Wegener M, Sheppard I, Steadman P, Martino A, Domingo R, Gayda S (2004) PROPOLIS: planning and research of policies for land use and transport for increasing urban sustainability. PROPOLIS final report. LT Consultants, Helsinki Lee DB (1973) Requiem for large-scale models. J Am Inst Plann 39(3):163–178 Lowry IS (1964) A model of metropolis. RM-4035-RC. Rand Corporation, Santa Monica Marshall S, Banister D (eds) (2007) Land use and transport. European research towards integrated policies. Elsevier, London Martinez FJ (1996) MUSSA: land use model for Santiago City. Transp Res Rec 1552/1996:126–134 Martinez FJ (2018) Microeconomic modeling in urban science. Academic Press, Elsevier, St. Louis Miyamoto K, Kitazume K (1989) A land use model based on random utility/rent-bidding analysis (RURBAN). Selected proceedings of the fifth world conference on transport research (4), pp 107–121. WCTRS, University of Leeds, Leeds Miyamoto K, Udomsri R (1996) An analysis system for integrated policy measures regarding land use, transport and the environment in a metropolis. In: Hayashi Y, Roy J (eds) Transport, land use and the environment. Kluwer, Dordrecht, pp 259–280 Miyamoto K, Vichiesan V, Sugiki N, Kitazume K (2007) Application of RURBAN integrated with a travel model in detailed zone system. Selected proceedings of the 11th world conference on transport research. WCTRS, University of Leeds, Leeds Nguyen-Luong D (2008) An integrated land-use transport model for the Paris Region (SIMAURIF): ten lessons learned after four years of development. IAURIF, Paris Orcutt G, Greenberger M, Rivlin A, Korbel J (1961) Microanalysis of socioeconomic systems: a simulation study. Harper and Row, New York Ravenstein EG (1885) The laws of migration. J Stat Soc Lond 48(2):167–235 Salvini PA, Miller EJ (2005) ILUTE: an operational prototype of a comprehensive microsimulation model of urban systems. Network Spatial Econ 5(2):217–234 Simmonds DC (1999) The design of the DELTA land-use modelling package. Environ Plann B Plann Des 26(5):665–684 Snickars F, Weibull JW (1977) A minimum information principle. Reg Sci Urban Econ 7(1–2):137–168 von Thünen JH (1826) Der isolierte Staat in Beziehung auf Landwirtschaft und Nationalökonomie. Perthes, Hamburg Waddell P (2002) UrbanSim: modeling urban development for land use, transportation and environmental planning. J Am Plan Assoc 68(3):297–314

246

M. Wegener

Wagner P, Wegener M (2007) Urban land use, transport and environment models: experiences with an integrated microscopic approach. disP 170(3):45–56 Wegener M (1982) Modeling urban decline: a multilevel economic-demographic model of the Dortmund region. Int Reg Sci Rev 7(2):217–241 Wegener M (2004) Overview of land-use transport models. In: Hensher DA, Button KJ (eds) Transport geography and spatial systems, Handbook 5 of handbook in transport. Pergamon/ Elsevier Science, Kidlington, pp 127–146 Wegener M (2011) From macro to micro – how much micro is too much? Transp Rev 31(2):161–177 Wegener M (2018) The IRPUD model. Arbeitspapier 18/01. Spiekermann & Wegener Stadt- und Regionalforschung, Dortmund Wegener M, Fürst F (1999) Land-use transport interaction: state of the art. Berichte aus dem Institut für Raumplanung 46. Institute of Spatial Planning, University of Dortmund, Dortmund Wegener M, Gnad F, Vannahme M (1986) The time scale of urban change. In: Hutchinson B, Batty M (eds) Advances in urban systems modelling. North Holland, Amsterdam, pp 145–197 Wilson AG (1967) A statistical theory of spatial distribution models. Transp Res 1(3):253–269 Wilson AG (1970) Entropy in urban and regional modelling. Pion, London Zahavi Y (1974) Traveltime budgets and mobility in urban areas. Report FHW PL-8183. US Department of Transportation, Washington, DC Zipf GK (1949) Human behaviour and the principle of least effort. Addison Wesley, Cambridge, MA

Network Equilibrium Models for Urban Transport David Boyce

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Historical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Model Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Definitions and Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Methodological Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Deterministic Route Choice over a Road Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Stochastic Route Choice over a Road Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Mode and Route Choice over Road and Fixed Cost Networks . . . . . . . . . . . . . . . . . . . . . . 3.6 O-D, Mode, and Route Choice over Road and Fixed Cost Networks . . . . . . . . . . . . . . . 4 Model Solution and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Solution Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Unique Route Flows and Multi-class Link Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

248 249 251 251 252 253 257 261 264 268 268 270 271 274

Abstract

Methods for the analysis and prediction of travel conforming to macroscopic assumptions about choices of the urban population cut a broad swath through the field of regional science: economic behavior, spatial analysis, optimization methods, parameter estimation techniques, computational algorithms, network equilibria, and plan evaluation and analysis. This chapter seeks to expose one approach to the construction of models of urban travel choices and implicitly location choices. Beginning with the simple route choice problem faced by vehicle operators in a congested urban road network, exogenous constants are relaxed and replaced with additional assumptions and fewer constants, leading D. Boyce (*) Department of Civil and Environmental Engineering, Northwestern University, Evanston, IL, USA e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_45

247

248

D. Boyce

toward a more general forecasting method. The approach, and examples based upon it, reflects the author’s research experience of 40 years with the formulation, implementation, and solution of such models. Keywords

Public transport · Mode choice · Route choice · User equilibrium · Traffic assignment

1

Introduction

Journey times and costs are important variables in determining the wide range of choices available to individual travelers. To predict personal travel choices on congested urban road and public transport systems, journey times must be endogenous to the model. This statement is axiomatic. Otherwise, the representation of user congestion, a principal causative agent of urban travel and location choices, is not possible. This axiom provided the foundation for the original formulation of the road traffic network equilibrium model by Martin Beckmann (Beckmann et al. 1956). This seminal contribution, on which the entire field of urban travel choice modeling is implicitly based, was then overlooked for more than a decade. By the time it was rediscovered, a sequential, fourstep paradigm had taken hold, consisting of (a) trip generation: the total amount of travel per time period (hour, day) that begins and ends at each location; (b) trip distribution: the amount of travel from every origin to every destination; (c) mode split: the proportion of trips by private cars, trains, buses, cycles, walking, and other modes of travel; and (d) traffic assignment: allocation of modal trip matrices to shortest routes to determine road link and transit line flows. Researchers then began to ask how to combine these steps into a more internally consistent method, only to arrive at Beckmann’s original formulation and its extensions. Because of this irony of history, this literature became known as “combined models.” The objective of this chapter is to introduce one type of transportation network user-equilibrium model that originated from Beckmann’s formulation: multi-class, multimodal, static models of origin-destination, mode, and route choices. Multiclass refers to models that consider two or more classes of travelers with different behavioral or choice characteristics. Multimodal refers to the consideration of all modes, such as public transport systems, but also including cycling and walking, in addition to motor vehicles on the road network. Static refers to models of constant flows over a congested period, such as the weekday morning or afternoon commuting period, possibly divided into intervals as short as 60 min. This focus stems from an interest in models that are useful for decision-making about long-range transportation investments as well as short-range demand management. The era of building large-scale urban transportation infrastructure in developed urban economies has largely passed. Now these urban areas are focused on demand management issues, such as road pricing, as well as incremental additions to their road, public transport, and cycle-walkway systems. In contrast, rapidly

Network Equilibrium Models for Urban Transport

249

developing urban economies, especially in Asia, are presently engaged in infrastructure development. Effective and efficient decisions for these systems’ investment and management require an advanced evaluation framework to provide information on the distribution of impacts on residents, employers, and public agencies. Travel forecasts are central to such a framework. A conviction that travel forecasting models have the potential to be substantially superior to current travel forecasting practice, described by Ortúzar and Willumsen (2011), is one motivation for this chapter. Following a brief historical overview, formulations of several models are offered, beginning with a basic model of route choice on a road network. Assumptions about what is exogenous to that model are then relaxed, enabling a more general model to emerge. Solution algorithms for these models are described, including the issue of the uniqueness of route flows and multiclass link flows. A brief discussion of future research and practice concludes the chapter. References emphasize seminal works in the field and syntheses useful to newcomers.

2

Historical Overview

The historical development of this field is complex, in part because separate strands of research and practice provide a variety of approaches. An extensive historical account and mathematically rigorous synthesis of the field with over 1,000 references was prepared by Patriksson (1994). Marcotte and Patriksson (2007) updated and substantially extended that earlier synthesis. Sheffi (1985) synthesized his own contributions on stochastic route choice, as well as integrating some findings of others. Oppenheim (1995) set out to write a textbook on travel demand models, and in addition, offered several theoretical advances to origin-destination-mode-route choice models based on random utility theory. Florian and Hearn (1995) synthesized the network equilibrium literature from the viewpoint of operations research. Bell and Iida (1997) articulated their view of transportation network analysis, including chapters on reliability and design. Nagurney (1999) explored the application of variational inequalities to a variety of network-related problems. A review of implemented combined models was offered by Boyce and Bar-Gera (2004). A more technical explanation of Beckmann’s formulation was offered by Boyce (2013). This overview is organized by groups of academic researchers working along similar lines. Beckmann did not follow up on his innovation. Instead, research extending Beckmann’s model was undertaken by Stella Dafermos and her contemporaries. From her 1968 Ph.D. thesis until her death in 1990, Dafermos established a wide-ranging theory of traffic network equilibrium, including contributions to models with variable and fixed demand, treatment of multiple user classes and asymmetric cost functions, and perhaps most importantly extensions and applications of the theory of variational inequalities to transportation network equilibria. From the late 1970s, Michael Smith independently pursued a similar line of inquiry, focused on traffic equilibrium, traffic signal timing, and road pricing. Patriksson (1994) lists 19 references by Dafermos and 14 references by Smith.

250

D. Boyce

The Centre for Research on Transportation at the University of Montreal, founded in 1972, embarked on theoretical research, model implementation, and testing. Initially led by Michael Florian (2008), successive generations of faculty and students made sustained contributions to network equilibrium modeling. Contributions to solving the transportation network equilibrium problem with variable demand, including mode choice, were made by Florian and Nguyen during the 1970s. Subsequently, several of these methods were implemented in EMME (www.inro.ca), an interactive-graphic multimodal urban transportation planning system. In the United Kingdom in the mid-1960s, John Murchland sought to devise an alternative to the sequential paradigm, but it was Suzanne Evans (1976) who devised a way to combine trip distribution and traffic assignment models into a single formulation, an optimization problem consisting of two parts, one related to route choice as in Beckmann’s formulation and the other related to trip distribution, as suggested by Wilson (1970). Evans extensively explored the mathematical properties of her formulation and proposed a solution algorithm; see Sect. 4.1. Boyce began to implement the formulation and algorithm of Evans in 1976. Over the next 25 years, he and his students, in separate collaborations with LeBlanc and Lundqvist, implemented a single-class, two-mode combined model on aggregated networks of Chicago and Stockholm. Model parameters were borrowed from other studies at first, but later estimated in a way that is self-consistent with the model solution. Boyce and Bar-Gera (2003) and several collaborators implemented, estimated, and validated a two-class, two-mode combined model at the same level of detail used by transportation planning professionals for the Chicago region. A booklength synthesis and critique of this field was published by Boyce and Williams (2015). In 1986, researchers in Chile began to implement multi-class combined models emphasizing route choices in the congested public transport network with several submodes found in Santiago (De Cea et al. 2005). This effort led to the development of ESTRAUS (www.mctsoft.com) which has been applied to Santiago and other Chilean cities. Aashtiani and Magnanti formulated a combined mode choice and traffic assignment model based on nonlinear complementarity theory, and Safwat and Magnanti extended this formulation to include trip generation as well as trip distribution; see Patriksson (1994) for references. Abrahamsson and Lundqvist (1999) extended a model of the Stockholm region to include parameter estimation methods and tested alternative specifications of nested travel choice functions. The author submits there are different views of how to model urban travel, which are often mutually stimulating to research and practice. For example, another view poses separate travel demand and network cost models, which are solved jointly with an iterative equilibration procedure. From this perspective, there is less emphasis on model integration and more focus on model structure, parameter estimation, and solution procedures for the separate demand and network models. This approach may offer more flexibility to innovative modelers, who indeed often describe themselves as either demand modelers or network modelers, but seldom both. However, it offers fewer opportunities to analyze the properties of the entire model structure and to insure the consistency of the overall approach.

Network Equilibrium Models for Urban Transport

3

251

Model Formulations

Formulations and analyses of combined models of travel choice on congested urban transportation networks based on constrained optimization methods are introduced here, articulating one way to derive models of varying degrees of scope and complexity. Detailed statements of model properties are omitted, but may be identified using standard techniques for deriving the optimality conditions for a convex function with equality and inequality constraints, as stated in Sect. 3.2. The model formulations represent the conventional (traditional) way of describing urban travel, known as tripbased, which originated in the United States in the 1950s. Activity-based or tour-based models, which are more representative of actual travel choices, are the subject of current research and advanced practice, but are not considered here.

3.1

Definitions and Assumptions

The following assumptions are briefly stated in agreement with current practice: 1. An urban region is divided into small, relatively homogeneous zones. Zone size varies with the density of development, so that activity levels per zone are relatively similar. 2. Urban activities in zones are described in terms of (a) residential population and households; (b) employment, education (primary, secondary, higher), and day care; (c) shopping, personal and business services, recreation, etc. 3. Facilities for urban activities consist of land and buildings: (a) residences, (b) workplaces, (c) schools, (d) shopping and service centers, and (e) parks and recreational facilities. 4. Travel occurs on two types of transportation systems or modes: (a) private vehicles/ways for driving-cycling-walking/traffic control system and (b) bus or train/roadway or railway/operations plan. Trucks also use the roadway system, depicted in car equivalent units. Transportation systems are represented as networks of nodes, links, link attributes, and for public transport, routes of scheduled services. Service characteristics of links depend on fixed parameters related to physical roadways and vehicle characteristics: (a) length, number, and width of lanes by type, including cycleways and walkways and grade; (b) public transport station spacing and vehicle performance; (c) control and operations plans: speed limits, signal settings, and road tolls; and (d) service frequencies, operating speeds, and public transport fares. 5. Other variables related to travel flows (demand) also determine service characteristics: (a) flows of cars, trucks, cycles, and pedestrians and (b) public transport boardings and alightings at stops per unit time. Taken together, these variables determine the performance characteristics of individual links: link travel time ¼ f ðflows fixed vehicle=way characteristics,and operations plansÞ Such cost performance functions are sometimes confused with supply functions. In a supply function, specific aspects of the vehicle-way-operations plan are not fixed, but are decision variables representing the operator or supplier of services. In contrast, in

252

D. Boyce

a travel forecast for a given scenario based on performance functions, optimal values of supply parameters are generally not represented. For example, traffic signal timings and public transport service frequencies are not optimized in response to the travel forecast. 6. Travel between daily activities (residing, working, eating, shopping, schooling, recreating) may be described in terms of pairs of activities linked by trips: (a) home-work, (b) work-eat meal, (c) work-shop, (d) shop-home, etc. Over the 24 h weekday, travel related to several activities makes up a sequence of trips connecting various purposes, or tour. The duration of the activities and the times required for travel determine the daily geographic range. In the trip-based approach, individual trips are aggregated by purpose and forecast as separate groups. Whether travel occurs by private car, either alone or with others, by public transport, cycle, or walking, depends on the availability of modes, their relative service times and monetary costs, as well as intangible factors like comfort and convenience. The timing of travel during the day also depends on constraints imposed by activity schedules, and the travel conditions on the private and public networks. 7. Travel occurs during a given period of the 24 h weekday, such as the morning peak commuting period. To represent observed trips, with their specific departure and arrival times, as flows (persons/unit time), a transformation is required, such as (a) all travelers departing from home for work during 6–9 a.m. are counted as flows in persons/hour, and (b) all travelers arriving at work from home during 6–9 a.m. are counted as flows in persons/hour. For transportation systems planning, facility design, operations planning, or conformance with air quality regulations, forecasts of the following variables are required for each transportation system/activity pattern scenario: (i) Flows of private cars, trucks, cycles, pedestrians, and public transport vehicles on the road network by morning and evening peak commuting periods and for longer periods when travel conditions are relatively stable (ii) Flows of persons on the public transport network by submode by time period (iii) Flows of persons from origin zone to destination zone by private vehicles and public transport by time period A capability to examine changes in these flows in response to changes in network layout, capacity and service attributes, monetary costs (e.g., fuel, tolls, fares, parking fees), and changes in zonal activity levels is required.

3.2

Methodological Approach

Travel choice models may be formulated and analyzed using several methods for solving optimization and equilibrium problems. These problems include convex optimization, nonlinear complementarity, variational inequality, geometric optimality, and fixed point, roughly in increasing order of generality. Each of these methods has been applied in the formulation of travel choice models. This brief introduction

Network Equilibrium Models for Urban Transport

253

is limited to minimization of a convex function subject to inequality constraints, which is suitable for derivations in this chapter and based on the classic KarushKuhn-Tucker theorem (Kuhn and Tucker 1951): min f ðxÞ

(1)

st: hi ðxÞ  0, i ¼ 1, . . . , m

(2)

ðxÞ

where x is an unknown vector of length n, f() is a strictly convex function, and hi(x)  0, i = 1, . . . , m, is a set of linear constraints. The necessary conditions for f (x) to be a local minimum are   m @f ðx Þ X @hi ðx Þ þ λi 0 @xj @xj i¼1 hi ð x Þ  0 λ i hi ð x Þ ¼ 0 λi  0

j ¼ 1, . . . , n

(3)

i ¼ 1, . . . , m i ¼ 1, . . . , m i ¼ 1, . . . , m

where λi. is a dual variable associated with the constraint hi(x)  0. If the inequality constraints include nonnegativity conditions, x  0, then the optimality condition can be written in a more compact and transparent manner, as follows:

xj

  m @f ðx Þ X @hi ðx Þ  λi 0 @xj @xj i¼1

j ¼ 1, . . . , n

(4)

 ! m @f ðx Þ X @hi ðx Þ  λi ¼0 @xj @xj i¼1

j ¼ 1, . . . , n

(5)

hi ð x  Þ  0

i ¼ 1, . . . , m

(6)

λ i h i ð x Þ ¼ 0

i ¼ 1, . . . , m

(7)

xj  0 j ¼ 1, . . . , n

(8)

λi  0 i ¼ 1, . . . , m:

(9)

Equations (5) are complementary slackness conditions, which state that either xj = 0,   m P   i ðx Þ or @f@xðxj Þ  λi @h@x ¼ 0, or both. As shown below, these conditions are needed j i¼1

for deriving the equilibrium conditions on route flows.

3.3

Deterministic Route Choice over a Road Network

In 1952, John Wardrop, a British traffic scientist, proposed the following criterion to describe traffic flows on a route:

254

D. Boyce

The journey times on all routes actually used are equal, and less than those which would be experienced by a single vehicle on any unused route. . . . (this) criterion is quite a likely one in practice, since it might be assumed that traffic will tend to settle down into an equilibrium situation in which no driver can reduce his journey time by choosing a new route. (Patriksson 1994, p. 31).

The first sentence is now known as Wardrop’s first principle of network user equilibrium (UE). Beckmann et al. (1956, p. 59), working at the Cowles Commission for Research in Economics at the University of Chicago during 1951–1954, described the concept of equilibrium more generally: Demand refers to trips and capacity refers to flows on roads. The connecting link is found in the distribution of trips over the network according to the principle that traffic follows shortest routes in terms of average cost. The idea of equilibrium in a network can then be described as follows. The prevailing demand for transportation, that is, the existing pattern of originations and terminations, gives rise to traffic conditions that will maintain that same demand. Or, starting at the other end, the existing traffic conditions are such as to call forth the demand that will sustain the flows that create these conditions.

Then, they described their concept of route equilibrium as follows (p. 60): . . . the principle of traffic distribution among alternative routes in equilibrium. (1) If between a given origin and a given destination more than one route is actually traveled, the cost of transportation to the average road user, as indicated by the average-cost capacity curves, must be equal on all these routes. (2) Since the routes used are the “shortest” ones under prevailing traffic conditions, average cost on all other possible routes cannot be less than that on the route or routes traveled. (3) The amount of traffic originated per unit of time must equal the demand for transportation at the trip cost which prevails.

Note that these statements reflect Beckmann’s view that origin–destination demand is variable, whereas Wardrop considered fixed flows. McGuire stated they were not aware of Wardrop’s paper at the time, although he became aware of it later (personal interview, 1999). By “routes actually used,” Wardrop meant the routes used from a given origin to a given destination, which may be defined as zones. If routes consist of sequences of links, then route costs may be defined as the sum of the link costs along the route. Since link costs depend on link flows, through the cost performance function, and each link serves (possibly) many routes, then identifying the route costs which satisfy Wardrop’s principle involves solving the route choice problem simultaneously for a system of zones and a network. Link flows have units of vehicles/hour (vph), so route flows and origindestination (O-D) flows also have units of vehicles/hour or persons/hour. The resulting route choice model is a steady-state flow model in which no individual travels from an origin to a destination. Rather, O-D-route flows occur with corresponding flows on the links of each used route. This formulation leads in a relatively simple concept of congestion, with no bottlenecks or traffic jams, but only steadily flowing vehicles traveling at speeds determined by cost performance functions.

Network Equilibrium Models for Urban Transport

255

The user-equilibrium link flows and costs, and a set of route flows, corresponding to fixed O-D flows may be determined by solving the following constrained optimization problem: Xð

fa

min zðhÞ ¼ ðhÞ

st:

aA

X

ca ðxÞdx

(10)

0

hr ¼ dpq ,

p  P; q  Q

r  Rpq ,

p  P; q  Q

r  Rpq

hr  0,

where f a 

X X

hr δar , a  A

pq r  Rpq

f a= flow of all vehicles on link a (vph), ca( fa)= generalized travel cost function for link a, a nondecreasing function of link flow fa, hr= flow of vehicles on route r of the set of routes Rpq connecting zone p to zone q (vph), dpq ¼ exogenous flow of vehicles from zone p to zone q (vph), δar = 1, if link a belongs to route r from zone p to zone q and 0 otherwise, P, Q = sets of origin and destination zones, respectively, A = set of links in the network. The unknown variables are vehicle route flows h = (hr); vehicle link flows ( fa) are defined in terms of the route flows. The link-route correspondence matrix (δar) is exogenous.  To simplify the derivation, truck flows are included in the single O-D matrix dpq : The optimality conditions for the above problem may be stated as follows: X ca ðf a Þδar  upq  0, r  Rpq , p  P, q  Q aA

hr 0 @

X

! ca ðf a Þδar  upq

aA

X

¼ 0,

r  Rpq , p  P, q  Q

1

hr  dpq A  0,

p  P, q  Q

r  Rpq

0 upq @

1

X

hr  dpq A ¼ 0,

p  P, q  Q

r  Rpq , upq  0,

p  P, q  Q

r  Rpq

hr  0,

(11)

256

D. Boyce

where upq is a dual variable associated with the conservation of flow constraint defined on the exogenous O-D flow dpq Conditions (11) may be interpreted as follows for O-D pair pq: ! X X (i) Assume hr > 0; then, ca ðf a Þδar  upq ¼ 0, or C r  ca ðf a Þδar ¼ upq aA aA ! X (ii) Assume hs = 0; then, ca ðf a Þδas  upq  0, or Cs  upq aA

(iii) Assume Ct > upq; then ht = 0 where Cr is the travel cost on route r, the sum of the costs of the links comprising route r. Hence, every used route connecting zone p to zone q has a generalized travel cost equal to upq, and no unused route has a lower travel cost. Thus, this formulation corresponds to Wardrop’s first principle. These conditions are shown in Table 1 in the first row. In the above formulation, identical and rational travelers are assumed to know accurately their travel times over alternative routes from their origins to destinations. The source of this information can only be described as being from their experience. Modern in-vehicle navigation systems may offer travel time information over one route at some point in time, but generally not over several alternative routes. Table 1 Deterministic models Choice Route

Equilibrium conditions hr > 0 ) C r ¼ upq ; hs ¼ 0 ) C s  upq ; C t > upq ) ht ¼ 0 X X Cr  ca ðf a Þδar ,r  Rpq , hr ¼ dpq , p  P, q  Q aA

Mode and route

hcr

X ¼ 0 ) C cr  ucpq ; hcr ¼ d cpq r  Rpq X ) hcr ¼ 0; C cr  ca ðf a Þδar , r  Rcpq

>0)

C cr > ucpq

r  Rpq

C cr

¼

ucpq ;

hcr

aA

d cpq > 0 ) ucpq ¼ κ pq ; d cpq ¼ 0 ) ucpq  κ pq ucpq > κ pq ) d cpq ¼ 0 d npq > 0 ) C npq ¼ κ pq ; d npq ¼ 0 ) C npq  κ pq C npq > κ pq ) d npq ¼ 0, n  N O-D, mode, and route Mode, O-D, and route

Deterministic equilibrium conditions for O-D flows correspond to the solution of a cost-minimizing allocation of origins to destinations, known as the classical transportation problem of linear programming (Evans 1973). The solution corresponds to a deterministic model for η ! 1 in the case of the O-D-mode model and the mode-O-D model. Based on empirical studies of origin-destination flows, such solutions are considered to be unrealistic for urban travel choices

Network Equilibrium Models for Urban Transport

257

Moreover, the model formulation applies to a relatively long time period, such as the morning peak period, during which actual route travel times may vary widely. This assumption implies that the model solution corresponds to a deterministic user equilibrium (DUE) and is relaxed somewhat in the next subsection. If the generalized cost functions are strictly increasing with link flows, then the objective function is strictly convex guaranteeing that the solution is unique in the link flows. The solution is not unique in the route flows, or class link flows in the case that two or more demand classes are specified, however, since the total link flows are linear functions of the route flows. Therefore, the objective function is not strictly convex in the route flows, as required for uniqueness. The above formulation applies to the case in which each link performance function depends only on the link’s own flow, which is called separable. In a more general case, called symmetric, each link performance function depends on a specified vector of link flows such that the effect of a change in link a’s flow on link b equals the effect of a change in link b’s flow on link a for all links specified in the cost function. An example of such a vector of link flows is the links entering an intersection. Generally, intersection delays are not symmetric, so this requirement is not met. Models not exhibiting such symmetries are called asymmetric and may be formulated variational inequality problems (Patriksson 1994, pp. 74–77). In contrast to the above model, the conventional approach to modeling route choice on a public transport network is relatively simple: (i) Represent all submodes (bus, rapid transit, commuter rail, express bus) in one network. (ii) Find a minimal generalized travel cost route from origin zone p to destination zone q considering access, waiting, boarding, in-vehicle, and transfer times as well as fares; congestion at boarding and alighting are not considered. (iii) Assign all public transport trips from zone p to zone q to a single minimal cost route, which is called all-or-nothing assignment. Even if all-or-nothing assignment to minimal cost routes is considered adequate, representing public transport networks is more complex than road networks. The use of only one route between each O-D pair is simplistic if several public transport options are offered. Methods for modeling public transport route choice are found in Ortúzar and Willumsen (2011, pp. 373–80).

3.4

Stochastic Route Choice over a Road Network

A way to relax the deterministic route choice model, and possibly make it more realistic, is to introduce an additional constraint. The role of this constraint is to soften or blur the deterministic character of the above model by allowing some portion of each O-D flow to choose routes with higher costs. Before exploring this idea, it is appropriate to ask how many routes are used by each O-D pair in the deterministic solution. To answer this question, a moderately congested car O-D

258

D. Boyce

matrix was computed for the 1790 zone system of the Chicago region, and added to a truck O-D matrix obtained from the region’s planning organization; see Bar-Gera and Boyce (2007). The total vehicle flow of 1,349,000 vph between 3,174,000 zone pairs was assigned to the Chicago regional network. The total number of UE routes in a very precise solution was 8,573,000, or 2.70 routes per O-D pair. Of these O-D pairs, about 55% have only one route, which seems surprising since many of these routes are very long. About 90% of O-D pairs have five or fewer routes, and 99% have 20 or fewer routes. However, one O-D pair has 1920 routes, the maximum in this solution. Figure 1 shows the number of O-D pairs on the y-axis versus the number of routes per O-D pair on the x-axis. The cumulative number of O-D pairs is shown starting at 1.0 at the upper left, decreasing to 1E-7 (0.0000001) at the lower right. Note where the line crosses the second horizontal line labeled 0.10 on the right y-axis; the five dots to the left of this intersection account for 90% of all O-D pairs. In a more congested solution, the number of routes is much larger. If one wished to distribute some O-D flow to higher cost routes, how might this be accomplished? Such a redistribution may be considered to be a “dispersion” of choices from the cost-minimizing UE routes to higher cost routes. A function depicting such a dispersion of proportions of route flow is available for this purpose. Known as the entropy function (Erlander and Stewart 1990, pp. 21–25), it has a oneto-one correspondence with the well-known logit function.

Number of O-D Pairs

Cumulative Proportion of O-D Pairs 10,000,000

1.0000000

1,000,000

0.1000000

100,000

0.0100000

10,000

0.0010000

1,000

0.0001000

100

0.0000100

10

0.0000010

1 1

10

100

1,000

Number of Routes per OD Pair

Fig. 1 Number of O-D pairs versus number of routes per O-D pair

0.0000001 10,000

Network Equilibrium Models for Urban Transport

259

A constraint can be formed to represent such a dispersion. Travelers strictly take the least cost route in the deterministic solution, so it is the least dispersed feasible solution to problem (10). By constraining the route choices to be greater than this minimum level, some choices are shifted to higher cost routes. The form of the constraint is 

X X X

hr lnðhr Þ > S UE

(12)

p  P q  Q r  Rpq

where SUE represents the dispersion of the choices in the DUE solution. Since there are unlikely to be any data at present on the dispersion of routes in a large network, a route dispersion constraint is simply a conceptual device. Modifying conditions (11) to include the effect of the dispersion constraint, one may obtain X

1 ca ðf a Þδar þ ðln hr þ 1Þ  upq  0, r  Rpq , p  P, q  Q θ aA ! X 1 hr ca ðf a Þδar þ ðln hr þ 1Þ  upq ¼ 0, r  Rpq , p  P, q  Q θ aA 0 1 X @ hr  dpq A  0, p  P, q  Q r  Rpq

0 upq @

X

1 (13)

hr  dpq A ¼ 0, p  P, q  Q

r  Rpq



XX X

hr lnðhr Þ  0

p  P q  Q r  Rpq

0 1 XX X 1@ Sþ hr lnðhr ÞA ¼ 0 θ p  P q  Q r  Rpq hr  0, r  Rpq , upq  0,

p  P, q  Q,

1 >0 θ

where S > SUE is the dispersion of an entropy-constrained solution, and 1/θ is the dual variable corresponding to the entropy constraint. The reason that it is defined as a reciprocal will become clear shortly. Because the route flow hr appears as the argument of the natural logarithm, it cannot take on a value of zero. Hence, all route flows are positive. Solving the complementarity slackness condition for hr > 0, the following optimality conditions may be obtained:

260

D. Boyce

  ln hr ¼ θ upq  C r  1, so   hr ¼ exp θ upq  1  θ C r ,

where C r 

X

ca ðf a Þδar :

(14)

aA

Apply the conservation of route flow constraint to this expression for hr to obtain X

  X hr ¼ dpq ¼ exp θ upq  1 expðθ C r Þ

r  Rpq

(15)

r  Rpq

  exp θ upq  1 ¼ X

dpq expðθ C r Þ

:

(16)

r  Rpq

Substituting this expression into the equation for hr yields the logit route choice function: expðθ C r Þ hr ¼ dpq X : expðθ C r Þ

(17)

r  Rpq

These conditions are shown in the first row of Table 2. Examination of these conditions reveals the structure of the stochastic user-equilibrium (SUE) model as well as raising several issues. Corresponding to the dispersion constraint, a logit route choice function is obtained. The logit function allocates some portion of the O-D flow dpq to every route connecting p to q. Of course, as Cr ) + 1 , exp (θ Cr) ) 0. For small values of θ, including zero, this limit could be ineffective. Therefore, a limit may need to be placed on the definition of routes and the maximum value of Cr One possibility is that in traversing each link of a route, a driver must travel farther in cost from the origin and closer to the destination. However, the equilibrium travel costs are not available for the purpose of defining these route costs, so a fixed link cost, such as the free-flow time, might be used. Another option is to place an upper limit on the travel cost for each O-D pair, which could be related to the deterministic travel cost: C SUE  C max  k C DUE , r  Rpq , k > 1. r r r Another problem with the logit route choice function is that the SUE routes are often not distinct alternatives. Rather, routes may have substantial overlapping segments in an urban application. As Patriksson (1994, p. 65) noted, O-D flows are over allocated to overlapping routes. Consequently, the logit model may yield overly large flows. This anomaly results from the inability of the logit model to account for the correlation between the costs of alternative routes, which stems from its independence from irrelevant alternatives property. Another problem is that the route choice probabilities are based solely on route cost differences, and do not take their magnitudes into account. For additional discussion of SUE models, see Patriksson (1994, pp. 60–65) and Sheffi (1985).

Network Equilibrium Models for Urban Transport

261

Table 2 Stochastic and mixed stochastic-deterministic models Choices Route

Functions and equilibrium conditions Stochastic route choice: ) hr > 0; C r > C max ) hr ) 0 C r  C max r r expðθC r Þ  X ,θ 0 then d cpq > 0, and ucpq ¼ κpq, the equilibrium modal O-D cost from zone p to zone q. If κ pq ¼ C npq , for one or more of the fixed cost modes n  N, then; d npq  0 otherwise, C npq > κpq , and d npq ¼ 0. The following conclusions may be drawn for this deterministic formulation: (i) If O-D flows occur by car, then all used routes have equal cost, and no unused route has a lower cost. (ii) The UE costs of the used car routes not only determine the O-D cost but also determine whether any of the fixed cost modes (public transport, cycle, and walk) have sufficiently low costs to be used: if ucpq < C npq, then d npq ¼ 0 (no one uses mode n). If ucpq ¼ C npq , then the O-D cost of fixed cost mode n and car are equal, and use of mode n may occur. If ucpq > C npq ¼ κpq , then all O-D flow occurs by one or more fixed cost modes, such as public transport from an outer suburb to the CBD, and no one uses car. That is, either there is no fixed cost mode flow or the fixed mode cost sets a maximum level for the car costs for each O-D pair. Hence, the solution is “all-or-nothing” with respect to mode. (iii) If the car occupancy υ were not added to the first term of the objective function, and to the definition of link flow, then the car O-D cost would be different from the O-D equilibrium cost by a factor equal to υ. For consistency of the formulation, then, the parameter υ is needed in the objective function. One often observes travel by two or more modes (car, public transport, cycle, or walk) between many O-D pairs in survey data. Therefore, the formulation of the mode and car route choice model as a deterministic cost minimization problem, while instructive, is unrealistic. The relaxation of this deterministic

264

D. Boyce

formulation is proposed through the addition of a modal dispersion constraint, as in the stochastic route choice model. A function representing modal dispersion may be imposed to make mode choices more dispersed than the DUE minimum level; that is, some choices are allocated to higher cost modes. The form of the constraint is   X X m  dm (19) pq ln d pq  S p,q m  M where S represents the level of dispersion of the choices to higher cost modes. Note: S cannot be observed except in very simple cases in which all of the observed choices are enumerated. S cannot be determined from sample data because a sample by its nature is less dispersed (more clustered) than the population. Let this constraint be added to the mode and route problem (18) above. The analysis of the optimality conditions is shown in the Mode and Route row of Table 2. Let 1/μ be the dual variable associated with the dispersion constraint. Then, in the same way as the logit route choice function was derived in Sect. 3.4, a logit mode choice function may be derived, as shown in the upper panel of the Mode and Route row of Table 2. This choice function includes the fixed cost modes (public transport, cycle, and walk), as well as car with its endogenous deterministic route costs. Together with the same UE conditions for car route costs, the function depicts the equilibrium conditions for stochastic mode and deterministic route choice. By replacing the car deterministic route conditions with the stochastic route choice function, a combined stochastic mode and route choice function may be obtained, as shown in the lower panel of the Mode and Route row of Table 2. Here the O-D cost of the car mode is the “composite cost” derived from the denominator of the logit route choice function. This composite cost replaces the equal route costs of each O-D pair governed by the deterministic conditions. The derivation of such a composite cost is given in the next section.

3.6

O-D, Mode, and Route Choice over Road and Fixed Cost Networks

The combined mode and route choice formulation can be further extended to include an origin–destination dispersion function in the same manner as described above for mode choice. In the version presented here, constraints are added to the mode and route choice formulation to derive a model corresponding to the classical trip distribution function (Wilson 1970, pp. 15–17). These constraints consist of origin and destination constraints and another dispersion function representing dispersion of trips to higher cost destinations, separately from the modal dispersion constraint. In the following development, the relationship of these two constraints is explored. This formulation may be stated as follows, by further augmenting problem (18):

Network Equilibrium Models for Urban Transport fa X ð

min zðh ,dÞ ¼ υ ðhc ,dÞ aA c

st:

X

ca ðxÞdx þ

265

X X

0

pq

nN

C npq d npq

(20)

hcr ¼ d cpq , p  P; q  Q

r  Rcpq

X

dm pq ¼ d pq , p  P; q  Q

m



 m d pq dm ln  SM pq d pq p,q,m X

X

p , p  P d pq ¼ O

q

X

p , q  Q d pq ¼ D

p



X

  d pq ln d pq  S PQ

p ,q

hcr  0, r  Rcpq , p  P, q  Q dm pq  0, m  M , p  P, q  Q d pq > 0, p  P, q  Q, X X hcr δar =υ, a  A: where f a  c p,q r  Rpq The O-D-mode flow d m pq is assumed to be conditional on the O-D flow dpq through its insertion into the denominator of the mode dispersion constraint as an a priori flow. The modal flows d m pq are constrained to sum to the O-D flow by the mode conservation of flow constraint. The origin–destination dispersion constraint is defined on SPQ, and the O-D flows are constrained by the exogenous origin and p and D q . destination totals, O Analysis of the UE conditions for car proceeds in the same way as in the mode and route choice models with regard to UE car cost C cpq . Consider the optimality conditions for d m pq and dpq:  m   d pq ln ¼ μ κ pq  C m pq  1 d pq     1 ln d pq ¼ η αp þ βq  κ pq  1 μ

(21)

266

D. Boyce

where 1/η is the dual variable for the O-D dispersion constraint and αp and βq are respectively the dual variables for the origin and destination constraints. Solving the first condition for d m pq, and applying the mode conservation of flow constraint, yields     m dm ¼ d exp μ κ  1 exp μ C pq pq pq pq X mM

(22)

   X m dm ¼ d ¼ d exp μ κ  1 exp μ C pq pq pq pq pq

(23)

mM

Solving for the exponential function containing κ pq, .X     exp μ C m exp μ κpq  1 ¼ 1 pq

(24)

mM

which can then be substituted into Eq. (22) to yield for the case of the car mode:

d cpq ¼ d pq

  exp μ C cpq  : X exp μ C m pq

(25)

mM

This result expresses the O-D-car flow as the O-D flow times a logit function based on the UE cost for car C cpq and the costs of the fixed costs modes C npq A similar   ~ pq  expression may be derived for the fixed cost modes. Now define exp μ C   P m ~ m  M exp μ C pq ; taking logs and solving for C pq gives the modal composite cost from zone p to zone q, X   ~ pq ¼  1 ln (26) exp μ cpqn C μ nM    X   Note that Eq. (24) can be rearranged as exp μ κpq þ μ1 ¼ exp μC m pq : mM

Therefore, ~ pq ¼ C



 1 κpq þ : μ

(27)

An expression for the O-D flow dpq can then be derived from optimality condition (21):      1 d pq ¼ exp αp þ βq  1 exp η κpq þ μ     ~ pq : ¼ exp αp þ βq  1 exp η C

(28)

Network Equilibrium Models for Urban Transport

267

By applying the origin and destination constraints, a more compact expression may be obtained:   ~ pq  p Bq D q exp η C d pq ¼ Ap O   p D q exp η C pq O ¼X  X   p exp η C pq q exp η C pq Bq D Ap O q

(29)

p

p , the where Ap and Bq are balancing factors defined by Eq. (29) that insure that O q the exogenous terminating flow at exogenous originating flow from zone p, and D zone q, are satisfied (Wilson 1970, pp. 22–25). By substituting the O-D function for dpq into the O-D-mode function for d m pq , the combined O-D-mode function may be stated as 

~   dm pq ¼ Ap Op Bq Dq exp η C pq



  exp μ C m pq  : X exp μ C m pq

(30)

mM

This model may be extended further to include stochastic route choice, as described in Sect. 3.4. According to one hierarchy hypothesized to motivate the dispersion constraints, route choices are deterministic cost-minimizing functions of car travel costs, mode choices tend to be cost minimizing with some dispersion to the higher cost mode, and O-D choices are even less cost minimizing. By this rationale, the cost sensitivity parameters estimated from survey data for the logit functions should have numerical values such that η  μ a larger value of the parameter means that travelers are more sensitive to the fixed transport costs and UE car costs than for a smaller parameter value. For an interpretation of these coefficients based on the utilities of the choices, see Williams (1977, pp. 330–336) and Oppenheim (1995, pp. 198–205). These values have additional implications in the logit function context. The cross elasticities of flow (demand) with respect to mode choice may be negative if η > μ, meaning that an increase in the cost c0 of a mode m0 would lead to a decrease in the demand for traffic on the competing mode m00 , contradicting what intuitively would be expected from the transportation system (Abrahamsson and Lundqvist 1999, p. 93). An implication of estimated parameter values that violate this condition is that the hypothesis of the model is incorrect and that mode choice is less cost minimizing than O-D choice or, equivalently, that O-D choice should be conditional on mode choice. This situation led Abrahamsson and Lundqvist (1999, pp. 86–87) to hypothesize the “reverse nested combined model,” shown in the fourth row of Table 2. For this hypothesis, mode choice is less cost minimizing than O-D choice. If the cost of the modes are very different (low for car and high for public transport), a very small value of μ could be required for the estimated model to predict the sample choices correctly.

268

D. Boyce

Boyce and Bar-Gera (2003) found that the parameter size condition was violated for other travel (not home-to-work travel) during the morning peak period for the Chicago area in 1990. Formulation and solution of such models for forecasting and scenario analysis is only meaningful if travel is segmented into several homogeneous classes, and implemented for time periods during the day with relatively stable levels of congestion.

4

Model Solution and Implementation

The solution and implementation of combined models of urban travel choice proceeded slowly in comparison with the sequential procedure in travel forecasting practice. Even so, combined models have provided a framework and basis for evaluating solution methods used in practice. This section briefly traces the evolution of solution methods for combined models, and describes a few notable efforts regarding their implementation and validation, concluding with a discussion of a new traffic assignment method for finding unique route flows and multi-class link flows.

4.1

Solution Algorithms

When the formulation of a model of variable demand and route choice on a network was first proposed by Beckmann et al. (1956), no solution algorithm was offered. Despite the needs of transportation planning studies in the United States, 1955–1975, and the United Kingdom, 1960–1975, to forecast urban travel for congested conditions, the potential contribution of Beckmann’s formulation was not recognized. By the late 1960s, several Ph.D. students had rediscovered Beckmann’s formulation and began to propose solution algorithms. Among these, the algorithm of Suzanne Evans was the most detailed and promising (Evans 1976). Evans proposed an iterative, convergent algorithm for trip distribution and traffic assignment (O-D and road route choice) that linearized the objective function only as necessary, and otherwise used the O-D choice functions directly. Her algorithm may be summarized as follows: Step 1 Find an initial solution: for free-flow link travel costs by road, find the least cost car routes between all zone pairs, compute an initial car O-D matrix with the travel choice function, and assign it to the least cost routes, resulting in an initial link flow vector; these arrays define a current feasible solution. Step 2 For the travel costs corresponding to the current road link flow vector, find the least cost car routes, compute a new O-D matrix with the travel choice function, and assign the car O-D matrix to the least cost routes, resulting in a new link flow vector. Step 3 Find weights (1  λ) and λ, 0  λ  1, when used to compute a weighted average of the current and new O-D matrices and link flow vectors, minimizes the

Network Equilibrium Models for Urban Transport

269

objective function of the formulation augmented by the nonlinear dispersion function times its dual variable. Step 4 Compute a convergence measure for the updated O-D matrix and link flow vector; if the solution has not converged to the target level, update the current solution and return to Step 2; otherwise, stop. The above algorithm is a partial linearization method for solving a convex optimization problem (Patriksson 1994, pp. 104–111). Although convergent, and useful for solutions on mainframe computers of the 1970–1980s era, the method converges slowly after the first several iterations. At that time, computer resources were generally insufficient to permit more than a few iterations for implementations of several hundred zones and a few thousand links typical of that period. A related algorithm, now known as the Frank-Wolfe method, began to be used from the mid-1970s to solve the road traffic assignment problem with fixed O-D flows. Combined models of travel choices were implemented and estimated since the early 1980s. To be realistic for practice, such models should represent two or more classes of travelers plus trucks. An early multi-class model was implemented and estimated by Lam and Huang (1992). A model implementation similar in scale to those used in practice was undertaken by Boyce and Bar-Gera (2003) for the morning peak period of the Chicago region. Extensive estimation studies were undertaken; the model was validated with travel-to-work data from the 1990 US Census, contributing new methods for model validation as well as model estimation. That model was solved with the Evans algorithm, the state of the art at that time. As computers expanded in size and speed during the 1980s with the introduction of supercomputers, engineering workstations and personal computers of similar memory and speed, algorithms for solving the traffic assignment problem with more precision and speed were proposed (Bar-Gera 2002; Dial 2006). These algorithms were origin-based, in contrast to the link-based assignment algorithm based on the full linearization of the objective function. Bar-Gera and Boyce (2003) applied Bar-Gera’s origin-based traffic assignment algorithm to devise a solution method for the origin-destination, mode, and car route choice problem that achieved more precise convergence than is possible with the Evans algorithm for large-scale problems. This algorithm replaced the link-based assignment in Step 2 with an origin-based procedure to update the solution of the assignment problem. Then the O-D matrices are updated followed by another assignment update, continuing until the convergence criterion is met. Unlike the Evans algorithm, a line search and averaging of solutions are not required. De Cea et al. (2005) implemented and estimated a combined model for Santiago, Chile. A software system for solving the model, ESTRAUS, was created and applied in the redesign of the public transport system of Santiago (http://en.wikipedia.org/ wiki/Transantiago). STGO, a software application in EMME, was created to implement a closely related model (Florian et al. 2002). These two systems represent two further implementations of combined models that are used by practitioners. CUBE (www.citilabs.com), TransCAD (www.caliper.com), and VISUM (www.ptv.de) also have the possibility to serve as platforms for implementing combined models.

270

D. Boyce

However, such implementations require substantial knowledge and programming skills. Solution of the stochastic route choice problem for large networks at a level of precision similar to DUE remains a work in progress. Lee et al. (2010) provided a detailed literature review, proposed two new algorithms, and reported computational results for a problem of moderate size. The use of a method proposed by Bell to find routes avoids the use of a maximum route length. Other problems remain, however, including overlapping routes, as discussed in Sect. 3.4.

4.2

Unique Route Flows and Multi-class Link Flows

For project evaluation and scenario analyses, total link flows and class O-D flows may suffice. More detailed analyses, however, require O-D-route flows or class link flows. Neither is uniquely determined by the solution of the standard traffic assignment formulation. Computed route flows and class link flows may be quite arbitrary. A simple example in Bar-Gera et al. (2012) illustrates this dilemma, which is well known to researchers and advanced practitioners. To choose among the infinite possibilities of route flow solutions for a UE model, an additional behavioral assumption is required. One plausible assumption is proportionality, namely, that the proportion of O-D flows assigned to each of two alternative route segments with precisely equal costs should be the same regardless of their origin or their destination. Proportionality also determines class link flows uniquely in multi-class assignments. The fixed demand traffic assignment with proportionality can be solved in two ways. First, the standard assignment problem can be solved with an origin-based algorithm to a precise level of convergence, such as a relative gap equal to 1E-7. Then the route flows can be adjusted with a post-processing procedure to achieve the same proportions for each O-D flow over each pair of alternative segments, leaving the link flows unchanged. This procedure is now available in the TransCAD and VISUM software systems. Second, the proportionality condition can be used to design a new algorithm to solve the assignment problem. This approach was the basis for TAPAS (Bar-Gera 2010). Comparisons of solutions with TAPAS versus link-based and route-based tools for the Chicago regional road network were presented in Bar-Gera et al. (2012). An example of route flows over a pair of alternative segments in the road network of the Chicago region is considered next. Two O-D matrices representing cars and trucks were assigned with TAPAS to the Chicago network by imposing the userequilibrium principle with proportionality. The total flow of vehicles per hour in the matrices is 984,717 cars and 445,185 trucks in car equivalent units. The matrices were assigned to two networks: (a) an unrestricted network in which trucks can use any link and (b) a restricted network in which trucks are prohibited from using 563 car-only links (car-only lanes of two freeways, the Lake Shore Drive, and boulevards and other roads with truck prohibitions). According to the proportionality condition, class O-D flows using a pair of alternative segments should have the same

Network Equilibrium Models for Urban Transport

271

8032 Segment 2

10344

6380

Segment 1 6389

Fig. 2 Pair of alternative segments in the Chicago road network

proportion on each segment for each assignment. Since the generalized cost variable is defined to be travel time, the same proportions should be observed for cars and trucks over a pair of segments with no truck restrictions. A pair of segments connecting nodes 8,032 and 10,344, shown in Fig. 2, was selected for this example. Figure 3 compares the total O-D-segment flows on Segment 2 (y-axis) with the total O-D-segment flows on Segment 1 (x-axis). These flows lie on a straight line, showing that the condition of proportionality is imposed. The slopes of the lines are slightly different, indicating only a small change in the proportions between the solutions for the restricted and unrestricted networks. Although the alternative segments have precisely equal travel times in each of the two solutions, the total flows are somewhat different, as shown at the top of the figure for the two solutions. Fig. 4 shows the car flows for Segments 1 and 2. Note the slopes in Fig. 4 are the same as in Fig. 3. The truck flows, the differences between the flows in Figs. 3 and 4, are not shown, but have the same slopes. The application of the condition of proportionality provides a meaningful and practical solution to the problem of nonuniqueness of route flows and multi-class link flows. TAPAS offers a method for rapidly and precisely solving the traffic assignment problem with proportionality.

5

Conclusions

Despite 60 years of research on urban transportation network equilibrium, many problems remain unsolved, and practice lags increasingly behind research knowledge. Research problems may be broadly classified according to travel choices, network representation, network design, and solution procedures, among others.

272 14

Unrestricted flow (vph): seg 1 = 405.5; seg 2 = 203.1 Restricted flow (vph): seg 1 = 393.9; seg 2 = 180.0

12 Segment 2 - O-D-segment flow (vph)

Fig. 3 Total route flows over two networks: (a) 683 segment pairs on unrestricted network (squares); (b) 678 segment pairs on restricted network (triangles)

D. Boyce

10 8 6 4 2 0 −2 −5

5 10 15 20 25 Segment 1 - O-D-segment flow (vph)

30

14 12 Segment 2 - O-D-segment flow (vph)

Fig. 4 Car route flows over two networks: (a) 680 segment pairs on unrestricted network (squares); (b) 670 segment pairs on restricted network (triangles)

0

Unrestricted flow (vph): seg 1 = 259.2; seg 2 = 129.8 Restricted flow (vph): seg 1 = 249.3; seg 2 = 113.9

10 8 6 4 2 0 −2 −5

0

5 10 15 20 25 Segment 1 - O-D-segment flow (vph)

30

(a) Until now, the modeling of travel choices has mainly followed the trip-based paradigm. Increasingly, travel demand modelers view travel in terms of daily tours or daily activities. Generally, prediction of tours or activities has been approached from a micro-simulation point of view. To the author’s knowledge, the stability of such simulations has generally not been examined, such as by

Network Equilibrium Models for Urban Transport

(b)

(c)

(d)

(e)

273

generating a sufficiently large sample of simulations and analyzing their variation. Model formulations of aggregate tour-based travel using concepts similar to those described in this chapter have been studied, but more investigation is needed (Bernardin et al. 2009). The representation of travel cost functions in network equilibrium models has advanced little beyond the original separable formulation of Beckmann. Although asymmetric models can be formulated as variational inequalities, the apparent lack of uniqueness of solutions has discouraged serious efforts to investigate this approach further. For example, the side-constrained method of Patriksson (1994, pp. 66–70) has not been investigated with large networks. Solution methods have generally not advanced beyond the so-called diagonalization (relaxation) method (Marcotte and Patriksson 2007, pp. 671–673). Another problem that has received very little attention is transportation network design. In a combinatorial sense, the network design problem is intractable because of its large size. Other approaches are possible, however, such as the spacing of freeways in a grid, as was considered early in the history of this field. A new approach to this problem might be to develop methods to generate and evaluate scenarios in a systematic and semiautomated manner. Such a method would require an ability to distinguish among the merits of closely related scenarios. Given the precision of solutions to network models now possible, this capability may now be achievable. Although practitioners are required by government regulations in the USA to solve their sequential travel forecasting procedures in a way that achieves an internal consistency of travel costs, there is no general agreement on how this should be done. No practitioner or software developer has described, tested, and demonstrated that one procedure is best among alternatives for complex models applied in practice. Moreover, the errors introduced in forecasts that do not achieve consistency remain unknown. This relatively straightforward research problem should be tackled. Academic interest in the solution of combined model formulations of travel choice has influenced travel forecasting practice, but only to a limited extent. Except for Santiago, Chile, combined models have rarely been applied in practice. The formulation of a combined model clearly enhances the understanding of the challenges facing the practitioner in solving the model sequence. Few practitioners, however, seem equipped by their training or mathematical ability to gain from insights from these formulation. Moreover, software developers have not incorporated tools in their software systems to facilitate the application of this approach. Based upon past experience, they will not do so until interest among practitioners strongly induces them to proceed.

Pursuit of this research agenda requires a knowledge of optimization methods, computer skills, data, and perseverance. Many similar problems could be identified, especially from the viewpoint of other perspectives. For those so inclined, the journey will be challenging, but always interesting, and hopefully rewarding.

274

D. Boyce

Acknowledgments Professor Huw Williams, Cardiff University, offered many useful comments on earlier drafts of this chapter. Dr. Hillel Bar-Gera, Ben-Gurion University of the Negev, has offered many stimulating insights and contributions to my thinking on combined network equilibrium models during the past 15 years. Dr. Yu (Marco) Nie, Northwestern University, has been a stimulating colleague during my renewed association with my undergraduate alma mater. Their contributions are greatly appreciated. Remaining errors are my responsibility.

References Abrahamsson T, Lundqvist L (1999) Formulation and estimation of combined network equilibrium models with applications to Stockholm. Transp Sci 33(1):80–100 Bar-Gera H (2002) Origin-based algorithm for the traffic assignment problem. Transp Sci 36(4):398–417 Bar-Gera H (2010) Traffic assignment by paired alternative segments. Transp Res B 44(8–9):1022–1046 Bar-Gera H, Boyce D (2003) Origin-based algorithms for combined travel forecasting models. Transp Res B 37(5):405–422 Bar-Gera H, Boyce D (2007) Some amazing properties of road traffic network equilibria. In: Friesz TL (ed) Network science, nonlinear science and infrastructure systems. Springer, Berlin, pp 305–335 Bar-Gera H, Boyce D, Nie Y (2012) User-equilibrium route flows and the condition of proportionality. Transp Res B 46(3):440–462 Beckmann M, McGuire CB, Winsten CB (1956) Studies in the economics of transportation. Yale University Press, New Haven Bell MGH, Iida Y (1997) Transportation network analysis. Wiley, Chichester Bernardin VL Jr, Koppelman F, Boyce D (2009) Enhanced destination choice models incorporating agglomeration related to trip chaining while controlling for spatial competition. Transp Res Rec 2131:143–151 Boyce D (2013) Beckmann’s transportation network equilibrium model: Its history and relationship to the Kuhn–Tucker conditions. Economics of Transportation 2(1):47–52 Boyce D, Bar-Gera H (2003) Validation of urban travel forecasting models combining origin–destination, mode and route choices. J Reg Sci 43(3):517–540 Boyce D, Bar-Gera H (2004) Multiclass combined models for urban travel forecasting. Netw Spat Econ 4(1):115–124 Boyce D, Williams, H (2015) Forecasting urban travel. Elgar, Chichester De Cea J, Fernandez JE, Soto A, Dekock V (2005) Solving network equilibrium on multimodal urban transportation networks with multiple user classes. Transp Rev 25(3):293–317 Dial RB (2006) A path-based user-equilibrium traffic assignment algorithm that obviates path storage and enumeration. Transp Res B 40(10):917–936 Erlander S, Stewart NF (1990) The gravity model in transportation analysis. VSP, Utrecht Evans SP (1973) A relationship between the gravity model for trip distribution and the transportation problem in linear programming. Transp Res 7(1):39–61 Evans SP (1976) Derivation and analysis of some models for combining trip distribution and assignment. Transp Res 10(1):37–57 Florian M (2008) Models and software for urban and regional transportation planning: contributions of the center for research on transportation. Infor 46(1):29–49 Florian M, Hearn D (1995) Network equilibrium models and algorithms. In: Ball MO, Magnanti TL, Monma CL, Nemhauser GL (eds) Network routing, handbooks in operations research and management science 8. Elsevier Science, Amsterdam, pp 485–550 Florian M, Wu JH, He S (2002) A multi-class multi-mode variable demand network equilibrium model with hierarchical logit structures. In: Gendreau M, Marcotte P (eds) Transportation and network analysis: current trends. Kluwer, Dordrecht, pp 119–133

Network Equilibrium Models for Urban Transport

275

Kuhn HW, Tucker AW (1951) Nonlinear programming. In: Neyman J (ed) Proceedings of the second Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 481–492 Lam WHK, Huang H-J (1992) A combined trip distribution and assignment model for multiple user classes. Transp Res B 26(4):275–287 Lee D-H, Meng Q, Deng W (2010) Origin-based partial linearization method of the stochastic user equilibrium traffic assignment problem. J Transp Eng-ASCE 136:52–60 Marcotte P, Patriksson M (2007) Traffic equilibrium. In: Barnhart C, Laporte G (eds) Transportation, handbooks in operations research and management science, vol 14. Elsevier Science, Amsterdam, pp 623–713 Nagurney A (1999) Network economics, 2nd edn. Kluwer, Boston Oppenheim N (1995) Urban travel demand modeling. Wiley, New York Ortúzar JD, Willumsen LG (2011) Modelling transport, 4th edn. Wiley, New York Patriksson M (1994) The traffic assignment problem: models and methods. VSP, Utrecht Sheffi Y (1985) Urban transportation networks. Prentice-Hall, Englewood Cliffs Williams HCWL (1977) On the formation of travel demand models and economic evaluation measures of user benefit. Environ Plann 9(3):285–344 Wilson AG (1970) Entropy in urban and regional modeling. Pion, London

Supply Chains and Transportation Networks Anna Nagurney

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Fundamental Decision-Making Concepts and Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 The User-Optimized Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The System-Optimized Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 The Braess Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Models with Asymmetric Link Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Variational Inequality Formulations of Fixed Demand Problems . . . . . . . . . . . . . . . . . . . . 3.2 Variational Inequality Formulations of Elastic Demand Problems . . . . . . . . . . . . . . . . . . 4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

278 282 283 287 290 291 292 294 298 298 298

Abstract

We overview some of the major advances in supply chains and transportation networks, with a focus on their common theoretical frameworks and underlying behavioral principles. We emphasize that the foundations of supply chains as network systems can be found in the regional science and spatial economics literature. In addition, transportation network concepts, models, and accompanying methodologies have enabled the advancement of supply chain network models from a system-wide and holistic perspective. We discuss how the concepts of system-optimization and user-optimization have underpinned transportation network models and how they have evolved to enable the formulation of supply chain network problems operating (and

A. Nagurney (*) Department of Operations and Information Management, Isenberg School of Management, University of Massachusetts, Amherst, MA, USA e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_47

277

278

A. Nagurney

managed) under centralized or decentralized, that is, competitive, decisionmaking behavior. We highlighted some of the principal methodologies, including variational inequality theory, that have enabled the development of advanced transportation network equilibrium models as well as supply chain network equilibrium models. Keywords

Supply chains · Transportation networks · User-optimization · Systemoptimization · Braess paradox · Variational inequalities

1

Introduction

Supply chains are networks of suppliers, manufacturers, transportation service providers, storage facility managers, retailers, and consumers at the demand markets. Supply chains are the backbones of our globalized network economy and provide the infrastructure for the production, storage, and distribution of goods and associated services as varied as food products, pharmaceuticals, vehicles, computers, and other high-tech equipment, building materials, furniture, clothing, toys, and even electricity. Supply chains may operate (and be managed) in a centralized or decentralized manner and be underpinned not only by multimodal transportation and logistical networks but also by telecommunication as well as financial networks. In a centralized supply chain, there is a central entity or decision-maker, such as a firm, that controls the various supply chain network activities, whereas in a decentralized supply chain, there are multiple economic decision-makers, and the governing paradigm is that of competitive behavior among the relevant stakeholders, with different degrees of cooperation. For example, in a vertically integrated supply chain, the same firm may be responsible for production, storage, and distribution of its products. On the other hand, certain industry supply chain network structures may consist of competitive manufacturers, competitive distributors, as well as competing retailers. Nevertheless, the stakeholders involved in supply chains must cooperate to the extent that the products be received and processed as they move downstream in the supply chain (Nagurney 2006). The complexity and interconnectivity of some of today’s product supply chains have been vividly illustrated through the effects of recent natural disasters, including earthquakes, tsunamis, and even hurricanes, which have severed critical nodes and/or links and have disrupted the production and transportation of products, with major economic implications. Indeed, when supply chain disruptions occur, whether due to natural disasters, human error, attacks, or even market failure, the ramifications can propagate and impact the health and wellbeing of the citizenry thousands of miles away from the initially affected location (cf. Nagurney and Qiang 2009). Since supply chains are network systems, any formalism that seeks to model supply chains and to provide quantifiable insights and measures must be a system-wide one

Supply Chains and Transportation Networks

279

and network based. Such crucial issues as the stability and resiliency of supply chains, as well as their adaptability and responsiveness to events in a global environment of increasing risk and uncertainty, can only be rigorously examined from the view of supply chains as network systems (Nagurney 2006). Supply chains share many of the same characteristics as other network systems, including a large-scale nature and complexity of network topology; congestion; which leads to nonlinearities; alternative behavior of users of the networks, which may lead to paradoxical phenomena (recall the well-known Braess paradox in which the addition of a new road may increase the travel time for all); possibly conflicting criteria associated with optimization (the minimization of time for delivery, e.g., may result in higher emissions); interactions among the underlying networks themselves, such as the Internet with electric power networks, financial networks, and transportation and logistical networks; and the growing recognition of their fragility and vulnerability. Moreover, policies surrounding supply chain networks today may have major implications not only economically but also socially, politically, and security-wise. Although, historically, supply chain activities of manufacturing, transportation/ distribution, as well as inventorying/storage have each, independently, received a lot of attention from both researchers and practitioners, the framework of supply chains views the various activities of production, transportation, and consumption in an integrated, holistic manner. Indeed, without the critical transportation links, what is manufactured cannot be delivered to points of demand. Moreover, needed inputs into the production processes/manufacturing links cannot be secured. While, beginning in the 1980s (cf. Handfield and Nichols Jr 1999), supply chains have captured wide interest among practitioners as well as researchers, it may be argued that the foundations of supply chain networks can be found in regional science and spatial economics, dating to the classical spatial price equilibrium models of Samuelson (1952) and Takayama and Judge (1971) with additional insights as to production processes, transportation, and distribution provided by Beckmann et al. (1956). For example, in spatial price equilibrium models, not only is production of the commodity in question considered at multiple locations or supply markets, with appropriate underlying functions, but also the consumption of the commodity at the demand markets, subject to appropriate functions (either demand or demand price) as well as the cost associated with transporting the commodity between pairs of the spatially separated supply and demand markets. Spatial price equilibrium models have evolved to include multiple commodities and multiple modes of transportation and may even include general underlying transportation networks. Moreover, with advances in theoretical frameworks, including, for example, the theory of variational inequalities (Nagurney 1999), one can now formulate and solve complex spatial price equilibrium problems with asymmetric supply price, demand price, and unit transportation/transaction cost functions (for which an optimization reformulation of the governing spatial price equilibrium conditions does not hold). In addition, versions of spatial equilibrium models that capture oligopolistic behavior under imperfect, as opposed to perfect, competition serve as some of the basic supply chain network models in which competition is included, but, at the

280

A. Nagurney

same time, the important demand/consumption side is also captured (see Nagurney 1999 and the references therein). Interestingly, spatial price equilibrium problems can be reformulated and solved as transportation network equilibrium problems with elastic demands over appropriately constructed abstract networks or supernetworks (see Nagurney and Dong 2002). Hence, the plethora of algorithms that have been developed for transportation networks (cf. Sheffi 1985; Patriksson 1994; Nagurney 1999; Ran and Boyce 1996) can also be applied to compute solutions to spatial price equilibrium problems. It is worth noting that Beckmann, McGuire, and Winsten in their classical 1956 book, Studies in the Economics of Transportation, formulated transportation network equilibrium problems with elastic demands. They proved that under the assumed user link cost functional forms and the travel disutility functional forms associated with the origin/destination pairs of nodes that the governing equilibrium conditions (now known as user-optimized conditions) in which no traveler has any incentive to alter his route of travel, given that the behavior of others is fixed, could be reformulated and solved as an associated optimization problem. In their book, they also hypothesized that electric power generation and distribution networks, or in today’s terminology, electric power supply chains, could be transformed into transportation network equilibrium problems. This has now been established (cf. Nagurney 2006 and the references therein). Today, the behavior of travelers on transportation networks is assumed to follow one of Wardrop’s (1952) two principles of travel behavior, now renamed, according to Dafermos and Sparrow (1969), as user-optimized (selfish or decentralized) or system-optimized (unselfish or centralized). The former concept captures individuals’ route-taking decision-making behavior, whereas the latter assumes a central controller that routes the flow on the network so as to minimize the total cost. Moreover, a plethora of supply chain network equilibrium models, originated by Nagurney et al. (2002), have been developed in order to address competition among decision-makers in a tier of a supply chain whether among the manufacturers, the distributors, the retailers, and/or even the consumers at the demand markets. Such models capture the behavior of the individual economic decision-makers, as in the case, for example, of profit maximization, and acknowledge that consumers also take transaction/transportation costs into consideration in making their purchasing decisions. Prices for the product associated with each decision-maker at each tier are obtained once the entire supply chain network equilibrium problem is solved, yielding also the equilibrium flows of the product on the links of the supply chain network. Such supply chain network equilibrium models also possess (as spatial price equilibrium problems highlighted above) a transportation network equilibrium reformulation. Supply chain network models have been generalized to include electronic commerce options, multiple products, as well as risk and uncertainty on the demand side as well as on the supply side (cf. Nagurney 2006 and the referenced therein). In addition, and, this is product-specific, supply chain network models have also been constructed to handle time-sensitive products (fast fashion, holiday based, and even critical needs as in disasters) as well as perishable products (such as food, cut flowers, certain

Supply Chains and Transportation Networks

281

vaccines and medicines, etc.) using multicriteria decision-making formalisms for the former and generalized networks for the latter (see Masoumi et al. 2012). Both static as well as dynamic supply chain network models, including multiperiod ones with inventorying, have been formulated, solved, and applied. It is important to note that not all supply chains are commercial, and, in fact, given that the number of disasters is growing, as is the number of people affected by them, humanitarian supply chains have emerged as essential elements in disaster recovery. Unlike commercial or corporate supply chains, humanitarian supply chains are not managed using profit maximization as a decision-making criterion (since donors, e.g., would not approve), but rather cost minimization subject to demand satisfaction under uncertainty is relevant (see Nagurney and Qiang 2009). In addition, such supply chains may need to be constructed quickly and with the cognizant decisionmakers working under conditions of damaged, if not destroyed, infrastructure and limited information. Supply chain decision-making occurs at different levels – at the strategic, tactical, and operational levels. Strategic decisions may involve where to locate manufacturing facilities and distribution centers, whereas tactical decisions may include with which suppliers to partner and which transportation service providers (carriers) to use. Decisions associated with operational supply chain decision-making would involve how much of the product to produce at which manufacturing plants, which storage facilities to use and how much to store where, as well as how much of the product should be supplied to the different retailers or points of demand. In addition, because of globalization, supply chain decision-making may now involve outsourcing decisions as well as the accompanying risk management. Today, it has been argued that, increasingly, in the network economy it is not only competition within a product supply chain that is taking place but, rather, supply chain versus supply chain competition. Zhang et al. (2003) generalized Wardrop’s first principle of travel behavior to formulate competition among supply chains. Location-based decisions are fundamental to supply chain decision-making, design, and management. Furthermore, such decisions affect spatial competition as well as trade, with Ohlin (1933) and Isard (1954) noting the need to integrate industrial location and international trade in a common framework. Nagurney (2010) constructed a system-optimization model that can be applied to the design or redesign of a supply chain network and has as endogenous variables both the capacities associated with the links (corresponding to manufacturing, transportation, and storage) as well as the operational flows of the product in order to meet the demands. The model has been extended in various directions to handle oligopolistic competition as well as product perishability in specific applications (cf. Masoumi et al. 2012 and the references therein). At the same time that supply chains have become increasingly globalized, environmental concerns due to global warming and associated risks have drawn the attention of numerous constituencies. Firms are increasingly being held accountable not only for their own performance in terms of their environmental performance but also for that of their suppliers, subcontractors, joint venture partners, distribution outlets, and, ultimately, even for the disposal of their products. Consequently, poor

282

A. Nagurney

environmental performance at any stage of the supply chain may damage the most important asset that a company has, which is its reputation. Hence, the topic of sustainable supply chain network modeling and analysis has emerged as an essential area for research, practice, as well as for policy analysis (see Boone et al. 2012).

2

Fundamental Decision-Making Concepts and Models

In this section of this chapter, we interweave fundamental concepts in transportation that have been used successfully and with wide application in supply chain network modeling, analysis, operations management, and design. Our goal is to provide the necessary background from which additional explorations and advances can be made using a readable and accessible format. As noted in the introduction, over half a century ago, Wardrop (1952) considered alternative possible behaviors of users of transportation networks, notably, urban transportation networks, and stated two principles, which are named after him: First principle: The journey times of all routes actually used are equal and less than those which would be experienced by a single vehicle on any unused route. Second principle: The average journey time is minimal. The first principle corresponds to the behavioral principle in which travelers seek to (unilaterally) determine their minimal costs of travel; the second principle corresponds to the behavioral principle in which the total cost in the network is minimal. Beckmann et al. (1956) were the first to rigorously formulate these conditions mathematically and proved the equivalence between the transportation network equilibrium conditions, which state that all used paths connecting an origin/destination (O/D) pair will have equal and minimal travel times (or costs) (corresponding to Wardrop’s first principle), and the Kuhn-Tucker conditions of an appropriately constructed optimization problem, under a symmetry assumption on the underlying functions. Hence, in this case, the equilibrium link and path flows could be obtained as the solution of a mathematical programming problem. Their fundamental result made the formulation, analysis, and subsequent computation of solutions to transportation network problems based on actual transportation networks realizable. Dafermos and Sparrow (1969) coined the terms user-optimized (U-O) and system-optimized (S-O) transportation networks to distinguish between two distinct situations in which, respectively, travelers act unilaterally, in their own self-interest, in selecting their routes and in which travelers choose routes/paths according to what is optimal from a societal point of view, in that the total cost in the network system is minimized. In the latter problem, marginal total costs rather than average costs are equilibrated. As noted in the introduction, the former problem coincides with Wardrop’s first principle and the latter with Wardrop’s second principle. Table 1 highlights the two distinct behavioral principles underlying transportation networks. The concept of “system-optimization” is also relevant to other types of “routing models” in transportation, including those concerned with the routing of freight.

Supply Chains and Transportation Networks

283

Table 1 Distinct behavior on transportation networks User-optimization + User equilibrium principle User travel costs on used paths for each O/D pair are equalized and minimal

System-optimization + System-optimality principle Marginals of the total travel cost on used paths for each O/D pair are equalized and minimal

Dafermos and Sparrow (1969) also provided explicit computational procedures, that is, algorithms, to compute the solutions to such network problems in the case where the user travel cost on a link was an increasing (in order to handle congestion) function of the flow on the particular link and linear. Today, the concepts of user optimization versus system optimization also capture, respectively, decentralized versus centralized decision-making on supply chain networks after the proper identifications are made (Boyce et al. 2005; Nagurney 2006). In this section, the basic transportation network models are first recalled, under distinct assumptions as to their operation and the underlying behavior of the users of the network. The models are classical and are due to Beckmann et al. (1956) and Dafermos and Sparrow (1969). In subsequent sections, we present more general models in which the user link cost functions are no longer separable but, rather, are asymmetric. For such models, we also provide the variational inequality formulations of the governing equilibrium conditions, since, in such cases, the governing equilibrium conditions can no longer be reformulated as the Kuhn-Tucker conditions of a convex optimization problem. The presentation follows that in Nagurney (2007) with addition of material on supply chains with synthesis. For easy accessibility, we recall the classical user-optimized network model in Sect. 2.1 and then the classical system-optimized network model in Sect. 2.2. The Braess (1968) paradox is, subsequently, highlighted in Sect. 2.3.

2.1

The User-Optimized Problem

The user-optimized network problem is also commonly referred to in the transportation literature as the traffic assignment problem or the traffic network equilibrium problem. Consider a general network G ¼ ½N ,L, where N denotes the set of nodes and L the set of directed links. Links connect pairs of nodes in the network and are denoted by a, b, etc. Let p denote a path consisting of a sequence of links connecting an origin/destination (O/D) pair of nodes. Paths are assumed to be acyclic and are denoted by p, q, etc. In transportation networks, nodes correspond to origins and destinations, as well as to intersections. Links, on the other hand, correspond to roads/streets in the case of urban transportation networks and to railroad segments in the case of train networks. A path in its most basic setting, thus, is a sequence of “roads” which comprise a route from an origin to a destination. In the supply chain network context, links correspond to supply chain activities (with appropriate

284

A. Nagurney

associated cost functions) and represent manufacturing, transportation/shipment, storage, etc. In addition, links can correspond to outsourcing links (see Nagurney 2006). Here we consider paths, rather than routes, since the former subsumes the latter. The network concepts presented here are sufficiently general to abstract not only transportation decision-making but also combined/integrated location-transportation decision-making as well as a spectrum of supply chain decisions. In addition, in the setting of supernetworks, that is, abstract networks, in which nodes need to correspond to locations in space (see Nagurney and Dong 2002), a path is viewed more broadly and need not be limited to a route-type decision but may, in fact, correspond to not only transportation but also to manufacturing and inventorying/storage decision-making. Let Pω denote the set of paths connecting the origin/destination (O/D) pair of nodes ω. Let P denote the set of all paths in the network and assume that there are J origin/destination pairs of nodes in the set Ω. Let xp represent the nonnegative flow on path p and let fa denote the flow on link a. All vectors here are assumed to be column vectors. The path flows on the network are grouped into the vector x  RnþP , where nP denotes the number of paths in the network. The link flows, in turn, are grouped into the vector f  RnþL, where nL denotes the number of links in the network. Assume, as given, the demand associated with each O/D pair ω, which is denoted by dω, for ω  Ω. In the network, the following conservation of flow equations must hold: dω ¼

X

xp

8ω  Ω

(1)

p  Pω

where xp  0 and 8p  P; that is, the sum of all the path flows between an origin/ destination pair ω must be equal to the given demand dω. In addition, the following conservation of flow equations must also hold: fa ¼

X

xp δap

8a  L

(2)

pP

where δap = 1, if link a is contained in path p, and zero, otherwise. Expression (2) states that the flow on link a is equal to the sum of all the path flows on paths p that contain (traverse) link a. Equations (1) and (2) guarantee that the flows in the network (be they travelers, products, etc.) are conserved, that is, do not disappear (or are lost) in the network and arrive at the designated destinations from the origins. Let ca denote the user link cost associated with traversing link a, and let Cp denote the user cost associated with traversing the path p. Assume that the user link cost function is given by the separable function in which the cost on a link depends only on the flow on the link, that is, ca ¼ ca ð fa Þ 8a  L

(3)

Supply Chains and Transportation Networks

285

where ca is assumed to be continuous and an increasing function of the link flow fa in order to model the effect of the link flow on the cost and, in particular, congestion. The cost on a path is equal to the sum of the costs on the links that make up that path, that is, Cp ¼

X

ca ð fa Þδap

8p  P

(4)

aL

2.1.1 Transportation Network Equilibrium Conditions In the case of the user-optimization (U-O) problem, one seeks to determine the path flow pattern x (and the corresponding link flow pattern f ) which satisfies the conservation of flow Eqs. (1) and (2) and the nonnegativity assumption on the path flows and which also satisfies the transportation network equilibrium conditions given by the following statement. For each O/D pair ω  Ω and each path p  Pω ,  Cp

¼ λω  λω

if if

xp > 0 xp ¼ 0

(5)

In the user-optimization problem, there is no explicit optimization criterion, since users of the transportation network system act independently, in a noncooperative manner, until they cannot improve on their situations unilaterally and, thus, an equilibrium is achieved, governed by the above equilibrium conditions. Conditions (5) are simply a restatement of Wardrop’s (1952) first principle mathematically and mean that only those paths connecting an O/D pair will be used which have equal and minimal user costs. In Eq. (5) the minimal cost for O/D pair ω is denoted by λω , and its value is obtained once the equilibrium flow pattern is determined. Otherwise, a user of the network could improve upon his situation by switching to a path with lower cost. Beckmann et al. (1956) established that the solution to the network equilibrium problem, in the case of user link cost functions of the form Eq. (3), in which the cost on a link only depends on the flow on that link and is assumed to be continuous and an increasing function of the flow, could be obtained by solving the following optimization problem:

Minimize

X ð fa aL

ca ðyÞdy

(6)

0

subject to X

xp ¼ d ω

8ω  Ω

(7)

8a  L

(8)

p  Pω

fa ¼

X pP

xp δap

286

A. Nagurney

xp  0

8p  P

(9)

The objective function given by Eq. (6) is simply a device constructed to obtain a solution using general purpose convex programming algorithms. It does not possess the economic meaning of the objective function encountered in the systemoptimization problem which will be recalled below. Note that in the case of separable, as well as nonseparable, but symmetric (which we come back to later) user link cost functions, the λω term in Eq. (5) corresponds to the Lagrange multiplier associated with the constraint (7) for that O/D pair ω. However, in the case of nonseparable and asymmetric functions, there is no optimization reformulation of the transportation network equilibrium conditions (5), and the λω term simply reflects the minimum user cost associated with the O/D pair ω at the equilibrium. As noted as early as Dafermos and Sparrow (1969), the above network equilibrium conditions also correspond to a Nash equilibrium (see Nash 1951). The equilibrium link flow pattern is unique for problem (6), subject to Eqs. (7)–(9), if the objective function (6) is strictly convex (for additional background on optimization theory. It has also been established (cf. Nagurney 2006 and the references therein) that multitiered supply chain network problems in which decision-makers (manufacturers, retailers, and even consumers) compete across a tier of the supply chain network but cooperate between tiers, as depicted in Fig. 1, could be transformed into a transportation network equilibrium problem using a supernetwork transformation, as in Fig. 2. In Fig. 2, the activities of manufacturing and retailer handling/storage are associated with the topmost and the third sets of links, respectively. The second and fourth sets of links from the top in Fig. 2 are the transportation links (as is the case with the links in Fig. 1). This connection provides us with a path flow efficiency interpretation of supply chain network equilibria. She utilized variational inequality theory (see below) to establish the equivalence.

Fig. 1 The multitiered network structure of the supply chain

Manufacturers 1

···

i

···

m

1

···

j

···

n

1

···

k

···

o

DemandMarkets

Retailers

Supply Chains and Transportation Networks

287

Fig. 2 The supernetwork representation of supply chain network equilibrium

0

ai x1

···

xi

a y1

···

···

xm

···

yn

···

yn

···

zo

ij

yj ajj

···

y 1

yj ajk

z1

2.2

···

zk

The System-Optimized Problem

We now recall the system-optimized problem. As in the user-optimized problem of Sect. 2.1, the network G ¼ ½N ,L, the demands associated with the origin/destination pairs, and the user link cost functions are assumed as given. In the system-optimized problem, there is a central controller who routes the flows in an optimal manner so as to minimize the total cost in the network. This problem has direct relevance to the management of operations of a supply chain. The total cost on link a, denoted by c^a ð fa Þ, is given by c^a ð fa Þ ¼ ca ð fa Þ  fa

8a  L

(10)

that is, the total cost on a link is equal to the user link cost on the link times the flow on the link. As noted earlier, in the system-optimized problem, there exists a central controller who seeks to minimize the total cost in the network system, which can correspond to a supply chain, where the total cost is expressed as X

c^a ð fa Þ

(11)

aL

and the total cost on a link is given by expression (10). The system-optimization (S-O) problem is, thus, given by

288

A. Nagurney

X

Minimize

c^a ð fa Þ

(12)

aL

subject to the same conservation of flow equations as for the user-optimized problem as well as the nonnegativity assumption of the path flows; that is, constraints (7), (8) and (9) must also be satisfied for the system-optimized problem. ^ p, is the user cost on a path times the flow on The total cost on a path, denoted by C a path, that is, ^ p ¼ C p xp C

8p  P

(13)

where the user cost on a path, Cp, is given by the sum of the user costs on the links that comprise the path (as in Eq. (4)), that is, Cp ¼

X

ca ð fa Þδap

8a  L

(14)

aL

In view of Eqs. (2) (3) and (4), one may express the cost on a path p as a function of the path flow variables, and, hence, an alternative version of the above systemoptimization problem with objective function (12) can be stated in path flow variables only, where one has now the problem Minimize

X

C p ðxÞxp

(15)

pP

subject to constraints (7) and (9).

2.2.1 System-Optimality Conditions Under the assumption of increasing user link cost functions, the objective function (12) in the S-O problem is convex, and the feasible set consisting of the linear constraints (7)–(9) is also convex. Therefore, the optimality conditions, that is, the Kuhn-Tucker conditions, are as follows: for each O/D pair ω  Ω and each path p  Pω, the flow pattern x (and corresponding link flow pattern f ) satisfying Eqs. (7)–(9) must satisfy  ^ 0 ¼ μω C p  μ ω

if if

xp > 0 xp ¼ 0

(16)

^ 0 denotes the marginal of the total cost on path p, given by where C p ^0 ¼ C p

X @^ c a ð fa Þ δap @fa aL

(17)

evaluated in Eq. (16) at the solution and μω is the Lagrange multiplier associated with constraint (7) for that O/D pair w.

Supply Chains and Transportation Networks

289

The system-optimization approach has been applied to supply chain networks in order to assess synergy associated with a possible merger or acquisition before such a decision, which may be very costly, is made. Nagurney and Qiang (2009) overview such an approach, which assesses the total cost prior to the merger and post. The premerger supply chains corresponding to the individual firms, prior to the merger, are depicted in Fig. 3, whereas the post-merger supply chain network is given in Fig. 4. The topmost links correspond to the manufacturing links in Fig. 3, followed by the transportation links ending in the storage/distribution facility links and followed by additional shipment links to the demand markets. In Fig. 4, on the other hand, the topmost links represent the merger/acquisition with appropriate total cost functions assigned to those links. Fig. 3 Case 0: firms A and B premerger

Firm A

Firm B

A

B

M nAA

M1A A

D 1,1

D nAA ,1

B D 1,1

D nAA ,2 D

D 1,2

R nAA

R1B

D

A

D 1,2

R1A

M

D nBB ,1 D

B

R

Fig. 4 Post-merger network

M nBB

M1B

M

D nBB ,2 D

R nBB R

0 Firm A

Firm B

A

M 1A

M nAA

DA 1,1

B D nAA ,1 D 1,1

D nBB ,1

DA 1,2

B D nAA ,2 D 1,2

D nBB ,2

R nAA

R nBB

R 1A

M

M 1B

D

D

R

R 1B

M nBB

M

D

D

R

290

2.3

A. Nagurney

The Braess Paradox

In order to illustrate the difference between user-optimization and system-optimization in a concrete example and to reinforce the above concepts, we now recall the well-known Braess (1968) paradox (see also Braess et al. 2005). Assume a network as the first network depicted in Fig. 5 in which there are four nodes: 1, 2, 3, 4; four links: a, b, c, d; and a single O/D pair ω1 = (1,4). There are, hence, two paths available to travelers between this O/D pair: p1 = (a, c) and p2 = (b, d ). The user link travel cost functions are ca ð fa Þ ¼ 10fa , cb ð fb Þ ¼ fb þ 50,

cc ð fc Þ ¼ fc þ 50,

cd ð fd Þ ¼ 10fd :

Assume a fixed travel demand d ω1 ¼ 6. It is easy to verify that the equilibrium path flows are xp1 ¼ 3 and xp2 ¼ 3 and the equilibrium link flows are fa ¼ 3, fb ¼ 3 , fc ¼ 3, fd ¼ 3 , with associated equilibrium path travel costs: C p1 ¼ ca þ cc ¼ 83 and C p2 ¼ cb þ cd ¼ 83. Assume now that, as depicted in Fig. 5, a new link “e,” joining node 2 to node 3, is added to the original network, with user link cost function ce( fe) = fe + 10. The addition of this link creates a new path p3 = (a, e, d ) that is available to the travelers. The travel demand d ω1 remains at six units of flow. The original flow pattern xp1 ¼ 3 and xp2 ¼ 3 is no longer an equilibrium pattern, since, at this level of flow, the user cost on path p3, Cp3 = ca + ce + cd = 70. Hence, users on paths p1 and p2 would switch to path p3. The equilibrium flow pattern on the new network is xp1 ¼ 2, xp2 ¼ 2, and xp3 ¼ 2, with equilibrium link flows fa ¼ 4, fb ¼ 2, fc ¼ 2, fe ¼ 2, and fd ¼ 4 and with associated equilibrium user path travel costs C p1 ¼ 92 and C p2 ¼ 92. Indeed, one can verify that any reallocation of the path flows would yield a higher travel cost on a path. Note that the travel cost increased for every user of the network from 83 to 92 without a change in the travel demand! The system-optimizing solution, on the other hand, for the first network in Fig. 5 is xp1 ¼ xp2 ¼ 3, with marginal total path ^0 ¼ C ^ 0 ¼ 116. This would remain the system-optimizing solution, costs given by C p1 p2

Fig. 5 The Braess network example

1 a

1 a

b

b e

2

3

c

d 4

2

3

c

d 4

Supply Chains and Transportation Networks

291

^ 0 , at this even after the addition of link e, since the marginal cost of path p3, C p3 feasible flow pattern is equal to 130. The addition of a new link to a network cannot increase the total cost of the network system but can, of course, increase a user’s cost since travelers act individually.

3

Models with Asymmetric Link Costs

In this section, we consider network models in which the user cost on a link is no longer dependent solely on the flow on that link. We present a fixed demand transportation network equilibrium model in Sect. 3.1 and an elastic demand one in Sect. 3.2. We note that fixed demand supply chain network problems are relevant to applications in which there are good estimates of the demand as would be the case in certain healthcare applications. Elastic demand supply chain network problems can capture price sensitivity associated with the product and are used in profitmaximizing settings (cf. Nagurney 2006). Asymmetric link costs are relevant also in the case of competitive supply chain network equilibrium problems. Assume that user link cost functions are now of a general form, that is, the cost on a link may depend not only on the flow on the link but on other link flows on the network, that is, ca ¼ ca ð f Þ 8a  L

(18)

In the case where the symmetry assumption exists, that is, @c@fa ð f Þ ¼ @c@fb ð f Þ, for all b a links a, b  L, one can still reformulate the solution to the network equilibrium problem satisfying equilibrium conditions (5) as the solution to an optimization problem, albeit, again, with an objective function that is artificial and simply a mathematical device. However, when the symmetry assumption is no longer satisfied, such an optimization reformulation no longer exists, and one must appeal to variational inequality theory (cf. Nagurney 1999 and the references therein). Models of supply chains and transportation networks with asymmetric cost functions are important since they allow for the formulation, qualitative analysis, and, ultimately, solution to problems in which the cost on a link may depend on the flow on another link in a different way than the cost on the other link depends on that link’s flow. It was in the domain of such network equilibrium problems that the theory of finite-dimensional variational inequalities realized its earliest success, beginning with the contributions of Smith (1979) and Dafermos (1980). For an introduction to the subject, as well as applications ranging from transportation network and spatial price equilibrium problems to financial equilibrium problems, see the book by Nagurney (1999). Below we present variational inequality formulations of both fixed demand and elastic demand network equilibrium problems.

292

A. Nagurney

The system-optimization problem, in turn, in the case of nonseparable (cf. Eq. (18)) user link cost functions becomes (see also Eq. (12)) Minimize

X

c^a ð f Þ

(19)

aL

subject to Eqs. (7)–(9), where c^a ð f Þ ¼ ca ð f Þ  fa and 8a  L. The system-optimality conditions remain as in Eq. (16), but now the marginal of the total cost on a path becomes, in this more general case, ^0 ¼ C p

3.1

X @^ cbð f Þ δap @fa a ,b  L

8p  P

(20)

Variational Inequality Formulations of Fixed Demand Problems

As mentioned earlier, in the case where the user link cost functions are no longer symmetric, one cannot compute the solution to the U-O, that is, to the network equilibrium, problem using standard optimization algorithms. We emphasize, again, that such general cost functions are very important from an application standpoint since they allow for asymmetric interactions on the network. For example, allowing for asymmetric cost functions permits one to handle the situation when the flow on a particular link affects the cost on another link in a different way than the cost on the particular link is affected by the flow on the other link. First, the definition of a variational inequality problem is recalled. For further background, theoretical formulations, derivations, and the proofs of the results below, see the books by Nagurney (1999) and by Nagurney and Dong (2002) and the references therein. We provide the variational inequality of the network equilibrium conditions in path flows as well as in link flows since different formulations suggest different computational methods for solution. Specifically, the variational inequality problem (finite-dimensional) is defined as follows: Definition 1 Variational Inequality Problem

The finite-dimensional variational inequality problem, VI ðF, KÞ, is to determine a vector X   K such that hF ðX  Þ, X  X  i  0,

8X  K

(21)

where F is a given continuous function from K to RN, K is a given closed convex set, and h , i denotes the inner product in RN.

Supply Chains and Transportation Networks

293

Variational inequality Eq. (21) is referred to as being in standard form. Hence, for a given problem, typically an equilibrium problem, one must determine the function F that enters the variational inequality problem, the vector of variables X, as well as the feasible set K. The variational inequality problem contains, as special cases, such well-known problems as systems of equations, optimization problems, and complementarity problems. Thus, it is a powerful unifying methodology for equilibrium analysis and computation and continues to be utilized for the formulation, analysis, and solution of a spectrum of supply chain network problems (cf. Nagurney 2006). A geometric interpretation of the variational inequality problem VI ðF, KÞ is given in Fig. 6. Specifically, F(X) is “orthogonal” to the feasible set K at the point X . Theorem 1 Variational Inequality Formulation of Network Equilibrium with Fixed Demands: Path Flow Version

A vector x  K1 is a network equilibrium path flow pattern, that is, it satisfies equilibrium conditions (5) if and only if it satisfies the variational inequality problem: X X

C p ð x Þ  ð x  x Þ  0

8x  K 1

(22)

ω  Ω p  Pω

or, in vector form, hC ðx Þ, x  x i  0

8x  K 1

(23)

where C is the nP-dimensional vector of path user costs and K1 is defined as K1  {x  0, such that Eq. (7) holds}.

Theorem 2 Variational Inequality Formulation of Network Equilibrium with Fixed Demands: Link Flow Version

A vector f  K2 is a network equilibrium link flow pattern if and only if it satisfies the variational inequality problem: X

  ca ð f  Þ  fa  fa  0

8f  K 2

(24)

aL

or, in vector form, D

E cð f  Þ T , f  f   0

8f  K 2

(25)

where c is the nL-dimensional vector of link user costs and K2 is defined as K2  {f j there exists an x  0 and satisfying Eqs. (7) and (8)}.

294

A. Nagurney

Fig. 6 Geometric interpretation of VI ðF,KÞ

Normal Cone

X

X − X∗

X∗

−F (X∗)

F (X∗) Feasible Set K

One may put variational inequality Eq. (23) into standard form Eq. (21) by letting F  C, X  x, and K  K1. One may also put variational inequality Eq. (25) into standard form where now F  c, X  f, and K  K2. Hence, fixed demand transportation network equilibrium problems in the case of asymmetric user link cost functions can be solved as variational inequality problems, as given above. The theory of variational inequalities (see Kinderlehrer and Stampacchia 1980; Nagurney 1999) allows one to qualitatively analyze the equilibrium patterns in terms of existence, uniqueness, as well as sensitivity and stability of solutions and to apply rigorous algorithms for the numerical computation of the equilibrium patterns. Variational inequality algorithms usually resolve the variational inequality problem into series of simpler subproblems, which, in turn, are often optimization problems, which can then be effectively solved using a variety of algorithms. We emphasize that the above network equilibrium framework is sufficiently general to also formalize the entire transportation planning process (consisting of origin selection, or destination selection, or both, in addition to route selection, in an optimal fashion) as path choices over an appropriately constructed abstract network or supernetwork. Further discussion can be found in the books by Nagurney (1999, 2000) and Nagurney and Dong (2002) who also developed more general models in which the costs (as described above) need not be separable nor asymmetric.

3.2

Variational Inequality Formulations of Elastic Demand Problems

We now describe a general network equilibrium model with elastic demands due to Dafermos (1982), but we present the single-modal version for simplicity. It is assumed that one has associated with each O/D pair ω in the network a travel disutility function λω, where here the general case is considered in which the disutility may depend upon the entire vector of demands, which are no longer fixed, but are now variables, that is, λω ¼ λω ðd Þ

8ω  Ω

(26)

Supply Chains and Transportation Networks

295

where d is the J-dimensional vector of the demands. The notation is as described earlier, except that here we also consider user link cost functions which are general, that is, of the form Eq. (18). The conservation of flow equations (see also Eqs. (1) and (2)), in turn, is given by fa ¼

X

xp δap

8a  L

(27)

8ω  Ω

(28)

pP

dω ¼

X

xp

p  Pω

xp  0

8p  P

(29)

In the elastic demand case, the demands in expression (28) are variables and no longer given, in contrast to the fixed demand expression in Eq. (1). The network equilibrium conditions (see also Eq. (5)) take on in the elastic demand case the following form. For every O/D pair ω  Ω and each path p  Pω, a vector of path flows and demands (x, d) satisfying Eqs. (28) and (29) (which induces a link flow pattern f  through Eq. (27)) is a network equilibrium pattern if it satisfies C p ð x Þ



¼ λω ðd  Þ  λω ðd  Þ

if if

xp > 0 xp ¼ 0

(30)

Equilibrium conditions (30) state that the costs on used paths for each O/D pair are equal and minimal and equal to the disutility associated with that O/D pair. Costs on unutilized paths can exceed the disutility. Observe that in the elastic demand model users of the network can forego travel altogether for a given O/D pair if the user costs on the connecting paths exceed the travel disutility associated with that O/D pair. This model, hence, allows one to ascertain the attractiveness of different O/D pairs based on the ultimate equilibrium demand associated with the O/D pairs. In addition, this model can handle such situations as the equilibrium determination of employment location and route selection, or residential location and route selection, or residential and employment selection as well as route selection through the appropriate transformations via the addition of links and nodes and given, respectively, functions associated with the residential locations, the employment locations, and the network overall (cf. Nagurney 1999; Nagurney and Dong 2002). In the next two theorems, both the path flow version and the link flow version of the variational inequality formulations of the network equilibrium conditions (30) are presented. These are analogues of the formulations Eqs. (22), (23), (24) and (25), respectively, for the fixed demand model and are due to Dafermos (1982).

296

A. Nagurney

Theorem 3 Variational Inequality Formulation of Network Equilibrium with Elastic Demands: Path Flow Version

A vector (x, d)  K3 is a network equilibrium path flow pattern, that is, it satisfies equilibrium conditions (30) if and only if it satisfies the variational inequality problem: X X

C p ð x Þ  ð x  x Þ 

ω  Ω p  Pω

X ωΩ

  λω ðd  Þ  d ω  d ω  0

(31)

8ðx, d Þ  K 3

or, in vector form, hC ðx Þ, x  x i  hλðd  Þ, d  d  i  0 8ðx, d Þ  K 3

(32)

where λ is the J-dimensional vector of disutilities and K3 is defined as K3  {x  0, such that Eq. (28) holds}.

Theorem 4 Variational Inequality Formulation of Network Equilibrium with Elastic Demands: Link Flow Version

A vector ( f , d )  K4 is a network equilibrium link flow pattern if and only if it satisfies the variational inequality problem: X aL

  X   ca ð f  Þ  fa  fa  λω ðd  Þ  d ω  d ω  0

8ð f, d Þ  K 4

(33)

ωΩ

or, in vector form, hcð f  Þ, f  f  i  hλðd  Þ, d  d  i  0

8ðf, d Þ  K 4

(34)

where K4  {( f, d ), such that there exists an x  0 satisfying Eqs. (27) and (28)}. @λw @λω Under the symmetry assumption on the disutility functions, that is, if @d ¼ @d , ω w for all w, ω, in addition to such an assumption on the user link cost functions (see following Eq. (18)), one can obtain (see Beckmann et al. 1956) an optimization reformulation of the network equilibrium conditions (30), which in the case of separable user link cost functions and disutility functions is given by

Supply Chains and Transportation Networks

X ð fa

Minimize

aL

297

ca ð yÞdy 

0

X ð dω ωΩ

λω ðzÞdz

(35)

0

subject to Eqs. (27)–(29). Variational inequality theory has become a fundamental methodological framework for the formulation and solution of competitive supply chain problems in which the governing concept is that of Nash equilibrium (see, e.g., Masoumi et al. 2012). In Fig. 7, a competitive supply chain network is depicted in which the firms have vertically integrated supply chains but compete in common demand markets. The topmost links represent manufacturing activities at different plants with different such links denoting alternative manufacturing technologies. The second set of links from the top reflects transportation, and alternative links depict the possibility of alternative modes of transportation. The next set of links corresponds to storage at the distribution centers and the final set of links the transportation to the demand markets. Here we also use multiple links to denote alternative technologies and transportation modes, respectively. The costs on the links can be separable or not and asymmetric, depending on the specific product application. Product differentiation and branding has also been incorporated into such supply chain networks using variational inequality theory. Observe that in the supply chain network depicted in Fig. 7, direct shipments from the manufacturing plants to the demand points/retailers are allowed and depicted by the corresponding links. Firm I

Firm 1 ···

1 ··· M11

1 D1,1

Mn11

···

···

··· Dn11 ,1

···

D

···

··· ...

···

···

··· ···

···

I D1,1 ···

RnR ··· R1 Demand Markets

Fig. 7 The competitive supply chain network topology

··· DnI I ,1 D

··· ...

I D1,2

··· ···

M

··· ...

D

MnII

··· ···

Dn11 ,2

··· ···

M1I

M

... 1 D1,2

···

···

··· ···

···

I

··· ···

···

DnI I ,2 D

298

A. Nagurney

Finally, it is important to emphasize that the dynamics of the underlying interactions can be formulated and has been done so using projected dynamical systems (Nagurney and Zhang 1996).

4

Conclusions

In this chapter, we have highlighted some of the major advances in supply chains and transportation networks, with a focus on the common elements as to the theoretical frameworks and underlying behavioral principles. We have also argued that the foundations of supply chains as network systems can be found in the regional science and spatial economics literature. Specifically, we have discussed how the concepts of system-optimization and user-optimization have underpinned transportation network models and, more recently, have evolved to enable the formulation of supply chain network problems operating (and managed) under centralized or decentralized, that is, competitive, decision-making behavior. We have also highlighted some of the principal methodologies, including variational inequality theory, that have enabled the development not only of advanced transportation network equilibrium models but also supply chain network equilibrium models. We have aimed to include both primary references as well as tertiary references; the interested reader can delve further, at his/her convenience and according to interest. In conclusion, transportation network concepts, models, and accompanying methodologies have enabled the advancement of supply chain network models from a system-wide and holistic perspective.

5

Cross-References

▶ Computable Models of Dynamic Spatial Oligopoly from the Perspective of Differential Variational Inequalities ▶ Network Equilibrium Models for Urban Transport ▶ Spatial Network Analysis

References Beckmann MJ, McGuire CB, Winsten CB (1956) Studies in the economics of transportation. Yale University Press, New Haven Boone T, Jayaraman V, Ganeshan R (2012) Sustainable supply chains: models, methods, and public policy implications. Springer, New York Boyce DE, Mahmassani HS, Nagurney A (2005) A retrospective on Beckmann, McGuire, and Winsten’s studies in the economics of transportation. Pap Reg Sci 84:85–103 Braess D (1968) Uber ein Paradoxon der Verkehrsplanung. Unternehmenforschung 12:258–268

Supply Chains and Transportation Networks

299

Braess D, Nagurney A, Wakolbinger T (2005) On a paradox of traffic planning, translation of the original D. Braess paper from German to English. Transp Sci 39:446–450 Dafermos S (1980) Traffic equilibrium and variational inequalities. Transp Sci 14:42–54 Dafermos S (1982) The general multimodal network equilibrium problem with elastic demand. Networks 12:57–72 Dafermos SC, Sparrow FT (1969) The traffic assignment problem for a general network. J Res Natl Bur Stand 73B:91–118 Handfield RB, Nichols EL Jr (1999) Introduction to supply chain management. Prentice-Hall, Englewood Cliffs Isard W (1954) Location theory and trade theory: short-run analysis. Q J Econ 68:305–320 Kinderlehrer D, Stampacchia G (1980) An introduction to variational inequalities and their applications. Academic Press, New York Masoumi AH, Yu M, Nagurney A (2012) A supply chain generalized network oligopoly model for pharmaceuticals under brand differentiation and perishability. Transp Res E 48:762–780 Nagurney A (1999) Network economics: a variational inequality approach, second and revised edition. Kluwer, Dordrecht Nagurney A (2000) Sustainable transportation networks. Edward Elgar, Cheltenham Nagurney A (2006) Supply chain network economics: dynamics of prices, flows and profits. Edward Elgar, Cheltenham Nagurney A (2007) Mathematical models of transportation and networks. In: Zhang W-B (ed) Encyclopedia of life support systems (EOLSS), Mathematical models in economics. United Nations Educational, Scientific and Cultural Organization (UNESCO), Paris Nagurney A (2010) Optimal supply chain network design and redesign at minimal total cost and with demand satisfaction. Int J Prod Econ 128:200–208 Nagurney A, Dong J (2002) Supernetworks: decision-making for the Information Age. Edward Elgar, Cheltenham Nagurney A, Qiang Q (2009) Fragile networks: identifying vulnerabilities and synergies in an uncertain world. Wiley, Hoboken Nagurney A, Zhang D (1996) Projected dynamical systems and variational inequalities with applications. Kluwer, Norwell Nagurney A, Dong J, Zhang D (2002) A supply chain network equilibrium model. Transp Res E 38:281–303 Nash JF (1951) Noncooperative games. Ann Math 54:286–298 Ohlin B (1933) Interregional and international trade. Harvard University Press, Cambridge, MA Patriksson M (1994) The traffic assignment problem. VSP, Utrecht Ran B, Boyce DE (1996) Modeling dynamic transportation networks, 2 revised edn. Springer, Berlin Samuelson PA (1952) Spatial price equilibrium and linear programming. Am Econ Rev 42:283–303 Sheffi Y (1985) Urban transportation networks. Prentice-Hall, Englewood Cliffs Smith MJ (1979) Existence, uniqueness, and stability of traffic equilibria. Transp Res B 13:259–304 Takayama T, Judge GG (1971) Spatial and temporal price and allocation models. North–Holland, Amsterdam Wardrop JG (1952) Some theoretical aspects of road traffic research. Proc Inst Civil Eng 1(II):325–378 Zhang D, Dong J, Nagurney A (2003) A supply chain network economy: modeling and qualitative analysis. In: Nagurney A (ed) Innovations in financial and economic networks. Edward Elgar, Cheltenham, pp 197–213

Computable Models of Dynamic Spatial Oligopoly from the Perspective of Differential Variational Inequalities Terry L. Friesz and Amir H. Meimand

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 The Notion of a Nash Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Competitive Oligopoly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Competitive Spatial Oligopoly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Variational Inequality (VI) Formulations of Spatial Oligopoly . . . . . . . . . . . . . . . . . . . . . . 3.3 Diagonalization Algorithm for Variational Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Static Competitive Network Oligopoly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Dynamic Network Oligopoly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 The Firm’s Objective Functional, Dynamics, and Constraints . . . . . . . . . . . . . . . . . . . . . . 5.3 The Differential Variational Inequality Formulation of Dynamic Network Oligopoly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Discrete-Time Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 A Comment About Path Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Interpretation of Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

302 303 303 304 306 306 307 309 309 310 312 316 317 318 321 323 323 323

Abstract

We begin this chapter with the basic definition of Nash equilibrium and formulation of static spatial and network oligopoly models as variational inequalities (VIs), T. L. Friesz (*) Department of Industrial and Manufacturing Engineering, Pennsylvania State University, University Park, PA, USA e-mail: [email protected]; [email protected] A. H. Meimand Zilliant, Austin, TX, USA e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_105

301

302

T. L. Friesz and A. H. Meimand

which can be solved by well-known numerical methods presented in the literature. We then move on to dynamic network oligopoly models and show the differential Nash game describing competitive network oligopoly may be articulated as a differential variational inequality (DVI) involving both control and state variables. Finite-dimensional time discretization is employed to approximate the model as a mathematical program, which may be solved by the multistart global optimization scheme found in the off-the-shelf software package GAMS when used in conjunction with the commercial solver MINOS. We also present a small-scale numerical example of differential network oligopoly approached from the DVI perspective. Keywords

Spatial Oligopoly · Dynamic Network Oligopoly · Differential Variational Inequalities

1

Introduction

The theory of competitive oligopoly is reviewed in Greenhut et al. (1987) and Greenhut and Lane (1989), and some basic models of spatial oligopoly are presented by Raa (1984), Dafermos and Nagurney (1987), Henderson and Quandt (1980), Novshek (1980), Matsushima and Matsumura (2003), and Taheri et al. (2014). Moreover Dafermos and Nagurney (1987), Harker (1984), Nagurney (1999), and Nagurney and Dong (2016a) studied the variational inequality (VI) approach to determine market equilibrium for a general static oligopoly model. The intention of this chapter is to construct computable general equilibrium models for static and dynamic spatial oligopoly with a computational rather than theoretical orientation. That said, it is nonetheless necessary to give a mathematical form to oligopoly models and explore some of their qualitative properties. Accordingly, we begin this chapter with the definition of Nash equilibrium and formulation of static spatial and network oligopoly as variational inequality in order to set the stage for the dynamic models we study subsequentially. In the network case, a few firms compete as Nash agents in the output market for a single homogeneous commodity. The firms are located at nodes of a transportation network in which common freight tariffs expressed as a fee per unit of flow on each network arc are known and faced by each oligopolist. We then move on to modeling and computing dynamic network oligopoly described by the notion of a differential Nash equilibrium and mathematically expressed as a differential variational inequality (DVI). Models like those presented in this chapter arise when constructing spatial computable general equilibrium models (Tobin and Friesz 1983; Friesz and Harker 1984; Dafermos and Nagurney 1987; Beckmann and Puu 1990; Friesz 1993), as well as partial equilibrium models, when detailed freight flows are emphasized for a specific application. Throughout, our emphasis is on computation rather than theory, and our style is that of a simple primer in order to make the material accessible by the widest possible audience. Although some of the network models considered herein

Computable Models of Dynamic Spatial Oligopoly from the Perspective of. . .

303

are notationally complex, our presentation relies only on very basic notions from microeconomic theory and elementary optimization.

2

The Notion of a Nash Equilibrium

Suppose there are N players in a game and each player i  [1, N]  I++ chooses a feasible strategy tuple xi from strategy set Ωi to maximize the utility function Θi : Ωi ! ℜ1, which will generally depend on other players’ strategies. So every player i  [1, N] is trying to solve his/her best response problem: max s:t:

 Θðxi ; xi Þ : x i  Ωi

ð1Þ

Note that, in Eq. (1), we use the notation:   xi ¼ x j, j 6¼ i i  ½1, N  to refer to the tuple of strategies of players other than i. It is assumed that the non-own tuple xi is known to player i. A Nash equilibrium is the vector of strategies   x ¼ xi : i  ½1, N  formed from player-specific tuples such that each xi solves the mathematical program (1). We denote a Nash equilibrium by NE(Θ, Ω) (Friesz 2010).

3

Competitive Oligopoly

Consider there are N firms i ¼ 2, . . . , N, which supply a homogeneous commodity in a noncooperative fashion. Let π ðQÞ: ℜ1þþ ! ℜ1þþ denote the inverse demand function, where Q  0 is the total supply in the market and qi  0 is the supply for firm i: Q¼

N X

qi :

ð2Þ

i¼1

Also, let V i ðqi Þ: ℜ1þþ ! ℜ1þþ denote the total production cost of firm i to supply qi units. Given the other firms strategy, qi, firm i is trying to maximize its total profit jNj Θi ðqi ; qi Þ: ℜþþ ! ℜ1 by solving its best response problem:     Max Θi qi ; qi ¼ qi π ðQÞ  V i qi qi

subject to

ð3Þ

304

T. L. Friesz and A. H. Meimand



N X

qi

i¼1

qi  0: It is obvious that there is Nash game between the players (firms) choosing a feasible strategy qi  0 and every player’s objective function depends on all other players’ strategies. This game can be defined as a Nash equilibrium NE(Θ, Ω) (Harker 1984). If V i(.) is convex and continuously differentiable for i ¼ 1, 2, . . . , N, the inverse demand function π(.) is strictly decreasing and continuously differentiable, and the  industry revenue function Qπ(Q) is concave, and then q ¼  1 2   jNj q , q , . . . , q N  ℜþþ is a Nash equilibrium solution if and only if: ðiÞ ðiiÞ

          π Q þ q i ∇qi π Q  ∇qi V i q i q i ¼ 0 8i ¼ 1, 2, . . . , N        π Q þ q i ∇qi π Q  ∇qi V i q i  0 8i ¼ 1, 2, . . . , N

where 

Q ¼

N X



qi

i¼1 

q i  0: Conditions (i) and (ii) are the first-order necessary (and sufficient) conditions for the set of problems defined by Eq. (3) for each i ¼ 1, 2, . . . , N (Murphy et al. 1982).

3.1

Competitive Spatial Oligopoly

Now we can generalize the classical oligopoly model to the spatial case. Assume that there are N firms and M demand markets that are generally spatially separated. Assume the homogenous commodity is produced by N firms and consumed in M markets. Let qi denote the output of firm i  [1, N]  I++ and c j denote the demand for the commodity at market j  [1, M]  I++. Let sij denote the commodity shipment form firm i to the market j. The following conservation of flow equations must hold: qi 

M X j¼1

cj 

N X i¼1

sij ¼ 0

8i ¼ 1, 2, . . . , N

ð4Þ

sij ¼ 0

8j ¼ 1, 2, . . . , M

ð5Þ

Computable Models of Dynamic Spatial Oligopoly from the Perspective of. . .

305

where 0  c  cmax , 0  q  qmax , 0  s  smax :

ð6Þ

Note that in these constraints, the column vectors jNj

q  ℜþþ jMj

c  ℜþþ jNjjMj

s  ℜþþ

denote the production outputs, demands, and the commodity shipments. As previously, V i(qi) is the total production cost for firm i to supply qi units, r ij is the freight rate (tariff) charged per unit of flow sij , and π j(c j) is the inverse demand function at market j. To consider the general situation, we assume the production cost may depend on the entire jNj production pattern V i ¼ V i ðqÞ: ℜþþ ! ℜ1þþ , the freight rate (tariff) may depend on jNjjMj the entire shipment pattern, r ij ¼ r ij ðsÞ: ℜþþ ! ℜ1þþ , and the inverse demand function at market j may depend on the entire consumption pattern, π j ¼ π j ðcÞ: jMj ℜþþ ! ℜ1þþ . Note that the profit of firm i, as in Dafermos and Nagurney (1987), is Θi ðc, q, sÞ ¼

M X j¼0

π j ðcÞsij  V i ðqÞ 

M X j¼0

r ij ðsÞsij :

ð7Þ

In light of constraints (Eqs. (4) and (5)), one my write Θ ¼ ΘðsÞ: Next we consider the usual oligopolistic market mechanism, in which the N firms behave as noncooperative agents and satisfy demand while maximizing their own profit. We seek to determine a commodity shipment pattern s for which the N firms will be in a state of equilibrium a fundamental concept explained in Nagurney (1999).

Definition 1 (Spatial Cournot-Nash Equilibrium).

A commodity shipment pattern s is said to constitute a Cournot-Nash equilibrium if for each for i ¼ 1, 2, . . . , N:        Θi sji , sji  Θi sji , sji where Ω ¼ {Eqs. (4), (5), (6)}.

8sji  Ω

306

T. L. Friesz and A. H. Meimand

3.2

Variational Inequality (VI) Formulations of Spatial Oligopoly

In this section we present the variational inequality (VI) formulation of the CournotNash equilibrium. Letting x ¼ ðc, q, sÞ we make use of the following theorem: Theorem 1 (Nash equilibrium equivalent to a variational inequality).

The general Nash equilibrium NE(Θ, Ω) is equivalent to the following variational inequality VI(∇Θ, Ω): 

find x  Ω such that

   T   ∇Θ x xx 0

ð8Þ

8x  Ω

the following regularity conditions hold: (i) each Θi(x) : Ωi ! ℜ1 is convex and continuously differentiable in xi; and (ii) each Ωi is a closed convex subset of ℜni . Proof See (Friesz 2010) ■ Applying Theorem 1, the competitive oligopoly problem may be formulated as the following VI (Nagurney 1999): 

find x  0

ð9Þ

such that N  X i¼1 

where Q ¼

N X

          ∇qi V i qi  π Q  ∇qi π Q qi qi  qi  0



qi and qi  0:

i¼1

3.3

Diagonalization Algorithm for Variational Inequality

In this section, the diagonalization algorithm (or diagonalization for short) discussed is one of several algorithms that can be used to solve a variational inequality. It is an algorithmic philosophy very similar to the Gauss-Seidel method familiar from the

Computable Models of Dynamic Spatial Oligopoly from the Perspective of. . .

307

numerical analysis literature (Friesz 2010). Diagonalization is appealing for solving finite-dimensional variational inequalities because the resulting subproblems are all nonlinear programs that can be efficiently solved with well-understood nonlinear programming algorithms, which are often available in the form of commercial software. This fact notwithstanding diagonalization may fail to converge, and its use on large-scale problems can be frustrating. We next describe the algorithm, again making use of the shorthand x ¼ ðc, q, sÞ Step 0. Initialization. Determine a feasible initial solution x0  Ω, and set k ¼ 0. k Step  1. Diagonalize  and solve the subproblem. We create separable functions Fi ðxi Þ Fi xi ; xkj 8j 6¼ i and then proceed to solve the variational inequality N X i¼1

   Fki xki xi  xki  0

x, xk  Ω

by solving the mathematical program max J ðxÞ ¼

N ð xi X i¼1

0

Fki ðxi Þ dxi

xΩ

and call the solution xk+1. Step 2. For a suitably small positive constant ξ  ℜ1þþ , if  xki j ξ max j xkþ1 i

i  ½1, N 

stop; otherwise set k ¼ k + 1 and go to Step 1. The other numerical methods for VI such as gap function method, fixed-point method, and successive linearization method and Lemke’s algorithm are presented in Friesz (2010) with some numerical examples.

4

Static Competitive Network Oligopoly

As background for static competitive network oligopoly, we imagine a transportation network that allows firms to ship their commodity to multiple markets through multiple paths. In this section, we consider the impact of spatial transportation networks on the firms’ economic decisions. For the time being, we restrict our consideration to the static case, while in a later section, we will extend this model

308

T. L. Friesz and A. H. Meimand

to the dynamic case. Several papers exist in the literature about modeling spatial network oligopoly. For example, Hakimi (1983) chooses locations for new facilities in an oligopolistic environment. Tobin and Friesz (1983) show that an equivalent optimization problem for spatial price network equilibrium may be formulated without path variables. Moreover, Rovinskey et al. (1980) developed a model for Cournot-Nash equilibrium of firms who are spatially separated but supply a commodity to a single market. Hashimoto (1985) and Harker (1986) formulated more general Cournot-Nash models in which consumers are also spatially dispersed. More recently Nagurney (2016b) presented an alternative variational inequality formulation and studied its application in supply chain network with product differentiation. In this section we are going to present the static network oligopoly model. In this model there is Nash game among firms whose facilities and final demand markets are fixed at distinct nodes of a distribution network and connected by paths involving chains of arcs of that network. The network is represented by G(N, W ), in which N is set of nodes and W is set of origin-destination (OD) pairs (i, j). Also F is a set of firms competing on the network. Nf is the set of nodes at which firm f has economic presence, and Wf is the set of OD pair used by firm f to transport commodity. Moreover, sijf is the shipment of firm f for OD pair (i, j), rij is the freight rate (tariff) charged per unit of flow sijf for OD pair (i, j), cif is the f allocation of the output   of firm f  F at node i  N f , qi is output of firm f  F at f f node i  N f , V i qi is the variable cost of production for qif units. The best response problem for firm f is: ! X g f X X f f X  f f f f f f  max Θ f c , q , h ; c , q , h ¼ πi ci ci  V i qi  rij sijf iN f

gF

iN f

ði, jÞ  W f

ð10Þ such that cif ¼ qif þ

X ði, jÞ  W

sjif 

X

sijf

8i  N f

ð11Þ

ð j, iÞ  W

f f max max 0  cif  cmax i , 0  qi  qi , 0  sij  sij :

ð12Þ

Each firm is seeking to maximize its total profit (Eq. (10)) expressed as revenue less production and transportation cost. Constraint (11) guarantees that flow is conserved at each node. Moreover, per constraints (12), all consumption, production, and shipping variables are nonnegative and bounded from above. The static network oligopoly can be formulated as the following VI:     find c f , q f , s f  Ω

ð13Þ

Computable Models of Dynamic Spatial Oligopoly from the Perspective of. . .

309

such that 2 3   X     X X X    f f f f f f 4 ∇q f Θ f q i  q i ∇ c f Θ f ci  ci ∇s f Θ f sij  sij 5  0 þ þ f F iN f

i

iN f

for all (c, q, s)  Ω where Ω ¼

5

i

ði, jÞ  W f

ij

    

c , q , s : Eqs: ð11Þ, ð12Þ hold

Dynamic Network Oligopoly

We now turn to the problem of modeling and computing differential Nash equilibria among the oligopolistic firms. The oligopolistic firms of interest, embedded in a network economy, form a competitive oligopoly under the influence of dynamics that describe the trajectories of inventories/backorders and correspond to flow conservation for each firm at each node of the network. The oligopolistic firms, acting as shippers, perfectly compete as price takers in the market for physical distribution services. Perfect competition in shipping arises because numerous shipping companies serve numerous customers due to the involvement of shippers in the numerous output markets of the network economy. The time scale we consider is neither short nor long but rather of sufficient length to allow output and shipping pattern adjustments, while not long enough for firms to relocate, enter, or leave the network economy. This model, which takes the form of a differential variational inequality, is presented in this section. Dynamic network oligopoly models were previously studied by Brander and Zhang (1993), Nagurney et al. (1994, 2002, 2016), Wie and Tobin (1997), Friesz et al. (2006), Markovich (2008), Mozafari et al. (2015), and Chan et al. (2018). For the most part, these prior models assumed inventory was cleared during each time period of interest. To develop a more general mathematical formulation of network oligopoly, we follow the exposition in Friesz (2010).

5.1

Notation

Fortunately, much of the notation introduced in previous sections of this chapter is still relevant. Yet, because there are some subtle differences between the dynamic oligopoly model that we now study and problems explored previously in this chapter, we choose to give an exhaustive list of the notation to be employed below, even though that will involve some duplication. In particular, we again let continuous time be denoted by the scalar t ℜ1þ , initial time by t0  ℜ1þ, and final  1 time by t f  ℜþþ , with t0 < tf so that t  t0 , t f ℜ1þ . There are several sets important to articulating a model of oligopolistic competition on a network; these are as follows: F for firms, A for directed arcs, N for nodes, and W for origin-destination (OD) pairs. Subsets of these sets are formed as is meaningful by using the subscripts f for a specific firm, i for a specific node, and ij for a specific OD pair (i, j).

310

T. L. Friesz and A. H. Meimand

Each firm f  F controls production (output) rates q f, allocation of output to meet demand c f, and shipping pattern s f. Inventories I f are state variables determined by the controls. In particular, concatenations of the firm-specific vectors c f, q f   jN jjF j   jN jjF j   jW jjF j and s f give the c  L2 t0 , t f , q  L2 t0 , t f , s  L2 t0 , t f . Furthermore, the state operator, once again, will be   jN jjF j  2  jN jjF j  2  jW jjF j  L t0 , t f  L t0 , t f I ðc, q, sÞ: L2 t0 , t f   jN jjF j ! H 1 t0 , t f   1 where L2[t0, tf] is the space of square-integrable functions and H t , t is a 0 f   1 Sobolev space for the real interval t0 , t f  ℜþ .

5.2

The Firm’s Objective Functional, Dynamics, and Constraints

Each firm has the objective of maximizing net profit expressed as revenue less cost and taking the form of an operator acting on allocations of output to meet demands, production rates, and shipment patterns. For each firm f  F , net profit is given by the following functional:       Φf c f , q f , s f ; cf , qf ¼ eρtf Z f I tf , t f 8 ! ð tf fsr > 0, for s 6¼ r. In this model, the accessible knowledge will change when any of the G-variables, travel sensitivities, and/or travel time distances change. Changes in the G-variables are the outcomes of knowledge generation processes, which can be illustrated by the following differential equation: G_ r ¼ H ½gr ðAr Þ, τr ðQr Þ, θr ðQr Þ

(2)

where gr is the R&D productivity in region r, which is a function of accessible knowledge in region r, namely, Ar . τr is the share of output Qr devoted to R&D in region r, and θr relates to the “learning-by-doing productivity” in region r, which is a function of total output in that region. The important message here is that regions do not only rely of their own internal knowledge and capacity to generate knowledge but also on their capacity to integrate knowledge generated in other regions. If we also introduce a second dynamic equation that describes the accumulation of production capital in region r, Kr, we get a simple model of multiregional development: K_ r ¼ sr ð1  τr ÞQr

(3)

where sr signifies the investment coefficient in region r. We will not discuss here the equilibrium properties of this model. Instead, our comments are limited here to aspects of knowledge generation and knowledge flows. We start with discussing the knowledge generation process in a region r, G_ r . It seems reasonable to assume that large and rich (and dense) regions will have advantages in terms of knowledge generation. They have a large output, can afford to devote large resources to R&D, have a larger capacity for learning by doing due to a larger output, and have normally greater knowledge accessibility due to a large intra-regional knowledge pool and a good accessibility to other large regions with a large knowledge pool. What then can we say about knowledge flows and the potential for knowledge spillovers? It is obvious that the knowledge accessibility of a region is critical here. Regions with a high accessibility to knowledge-rich regions obviously have a higher potential for knowledge flows and pertinent knowledge spillovers. A high accessibility does not only imply a high probability for trade flows but also for direct investments and for interaction between people. These are all mechanisms that can generate knowledge spillovers and thus be a source for increasing returns in regional economic development.

Knowledge Flows, Knowledge Externalities, and Regional Economic Development

4.3

949

Microeconomic Aspects of Knowledge Spillovers, Knowledge Externalities, and Regional Economic Development

Aggregated regional growth models offer general understanding of some of the basic factors that drive regional economic development. These models illustrate the importance of R&D, human capital investments, endogenous technological change, exports, agglomeration economies, and knowledge and other spillovers from a theoretical perspective, but they are too aggregated for empirical testing. This implies that these models do not have enough capacity to discriminate between several factors in terms of their importance for regional productivity and general regional economic growth. These models also have problems in identifying which the true exogenous factors are, i.e., the factors that are of interest for regional policymaking. Another problem with the aggregated models is that they assume that all economic agents are homogenous in all respects. Thus, these models fail to acknowledge that economic agents are heterogeneous in terms of their history, age, size, knowledge and other resources, location, networks, ownership structure, routines, strategies, and behavior even if they belong to the same industry. Industrial and regional productivity growth and general economic growth are generated by many different individual processes within the different economic agents, which are influenced by different internal and external factors. In every industry, there are economic agents that are persistently investing in R&D, while other economic agents invest now and then or not at all. The same is true for exports and imports. In terms of efficiency, we know that productivity distributions are characterized by a high degree of inertia and that the position that the different economic agents occupy in the distribution is highly persistent over time. To overcome these deficiencies and to increase our understanding of how knowledge spillovers and knowledge externalities influence regional economic development, we need a model framework at the microeconomic level capable of accommodating different conditions and different behavior among economic agents within the same industry. Thus, we need a model framework able of analyzing the factors influencing the behavior of different economic agents as well as the effects of differing behavior among economic agents in terms of productivity improvements and market shares. This chapter does not present such a microeconomic framework with such a capability. However, in the sequel, some aspects are highlighted that are critical for developing such a framework. An externality in production emerges when the output from one economic agent is influenced not only by its own inputs but also from the outputs and inputs used by other economic agents. When the externality emerges within the region, where the economic agent is located, it has the character of a proximity externality. We talk about localization economies, if other economic agents in the same trade generate the externality. If economic agents in other trades generate the externality, we talk about urbanization economies. If the externality emerges from outside the region, we can talk about an extra-market or network externality. If economic agents are engaged in two types of production – the production of goods and services and the

950

C. Karlsson and U. Gråsjö

production of knowledge – we can formulate two functions that describe the two processes, the first being the production function for economic agent i, in industry j in region r: qði,j,rÞ ¼ Ai,j,r ðcqi,j,r Þf ðxi,j,r , yj,r , zn,r , E s Þ

(4)

where q signifies the output, Ai, j,r the accessible knowledge for economic agent i in industry j in region r, cqi, j, r the absorptive capacity of economic agent i of production relevant knowledge, xi, j, r the inputs used by economic agent i, yj, r the accessibility to other economic agents in the region in the same industry as economic agent i, zn, r the accessibility to other economic agents in the regions in other industry than i, and Es the accessibility to economic agents in all other regions s. The above formulation implies that the productivity level in the economic agent i depends on its knowledge level, the size of the localization economies, the size of the urbanization economies, and the size of the extra-market externalities. What about then the generation of knowledge by economic agent i? Knowledge generation by economic agent i, in trade j in region r is n o h i g_ i, j,r ¼ h g i, j,r Ai, j,r , g_ Aj,r , g_ An,r , g_ As , θi, j,r ðqi, j,r Þ, cgi,j,r , τi,j,r ðqi, j,r Þ

(5)

where gi, j,r is the research productivity by economic agent i in industry j in region r, g_ Aj,r is the accessibility to knowledge output from other economic agents in industry j in region r, g_ An,r is the accessibility to knowledge output in other industries in region r, g_ As is the accessibility to knowledge output in other regions s, θi, j,r is the “learningby-doing productivity” in economic agent i, cgi, j, r is the absorptive capacity of economic agent i of R&D-relevant knowledge, and τi, j,r is the share of the output from economic agent i devoted to R&D. Equations (4) and (5) tell us that an economic agent combines its current knowledge with new knowledge developed by economic agents in the same industry and in other industries in the home region, with new knowledge developed in other regions, and with knowledge gained from own production experience. The three knowledge accessibilities included represent three potentials for knowledge flows and thus for knowledge spillovers. However, the probability of each economic agent to succeed in the knowledge generation game is among other things dependent upon their skills and capacity in interacting with other economic agents, including, for example, research universities, in the home region as well as in other regions. It is also dependent upon the extent to which the location offers suitable interaction infrastructures and meeting places and formal and informal institutions including both a potential to protect the newly generated knowledge and the necessary level of trust for successful interaction between economic agents. The formulation allows for intra-market, quasi-market, and extra-market spillovers in line with the earlier discussion in this chapter. Concerning the extra-market spillovers, we want to stress the role that exports and imports play as potential channels for knowledge spillovers.

Knowledge Flows, Knowledge Externalities, and Regional Economic Development

951

These spillovers can be direct, but they might also be indirect coming via other exporting or importing firms in the region.

4.4

Entrepreneurship, Clusters, and Regional Economic Development

The framework described above is partial. We need to complement that framework with a discussion on how knowledge spillovers influence the dynamics of entry, growth, decline, and exit of firms. Different regions have varying endowments of knowledge, and not least new knowledge, which implies that the potential for innovative entry differs between regions and that larger regions generally offer a higher potential since they also offer a larger market potential. Of course, individuals and groups of individuals differ in their capacity to discover, create, and exploit innovations, i.e., to create new combinations, but since individuals in larger regions generally have higher education, more varied work experiences, and more extensive personal networks, these regions have an advantage here too. However, it is important to dig deeper into this and discuss especially the emergence and dynamics of industrial clusters (Klepper 2007). The emergence of an industrial cluster is a manifestation of spillovers of new knowledge that materialize through the entry of new firms through three processes: (i) an established industry starting new firms in a new but related industry, (ii) employees in an established industry starting new firms in a new but related industry, and (iii) individuals without experience from a related industry starting a de novo firm. In the two first cases, the new knowledge used to start firms in a new industry represents a special form of knowledge spillover. The de novo firms, which by definition have done no R&D of their own, are dependent on spillovers of scientific, technological, and entrepreneurial knowledge from established industries and/or research institutions. Since the early entrants are built upon knowledge mainly coming from related industries in the same region, the emergence on new clusters can be described as an evolutionary branching process (Boschma and Frenken 2011). Even if an industrial cluster may emerge based upon the start-up of a firm in an unrelated industry, most new industrial clusters tend to emerge out of related industries in the region. Also, regions have a higher probability of developing a comparative advantage in a new industry if a neighbor region is specialized in that industry (Boschma et al. 2017). If economic agents have access to regional knowledge, they are more likely to create economic activities that are related to existing activities in the region. However, diversification and structural change generally require that local economies develop activities that utilize new knowledge through new establishments, especially via those with nonlocal backgrounds (Neffke et al. 2018). Future generations of firms in an industrial cluster tend mainly to be spin-offs from established firms in the cluster, where employees in these firms become entrepreneurs through an entrepreneurial spawning process bringing with them relevant knowledge from the actual industry. Of course, the motives behind the decision to spin-off from an existing firm do vary. However, the decision to leave an

952

C. Karlsson and U. Gråsjö

incumbent firm and establish a spin-off often is Schumpeterian in nature in the sense that the decision is based upon a trade-off between pursuing a new business idea via a new firm or trying to have the business idea developed within the incumbent firm where the person is employed. In this case, a person may become an entrepreneur due to a frustration over the new business idea being rejected by the current employer – a frustration that might be more prevalent in larger firms (Klepper 2007). An alternative scenario is when an employee perceives that his business idea is so profitable that he has strong incentives to become an entrepreneur. Spinoffs have an advantage over other types of firm entry in the sense that the entrepreneur has industry-specific knowledge and experience, which makes it easier to overcome many entry barriers and to compete with incumbent firms (Frenken et al. 2015). These spin-off firms are normally located close to the existing firms in the cluster due to that the new entrepreneurs are already established in or close to the locality. They have their families there as well as much of their social, professional, and business networks. They also have location-specific knowledge, and they are known in the locality, which makes it easier for them to find employees, suppliers, customers, financing, etc. Interestingly, successful spin-offs give rise to new generations of spin-offs, but over time competition increases, and exit rates tend to increase. Often older firms in the cluster suffer in this process with the result that over decades the stock of cluster firms is more or less totally renewed. The firms that are most capable of developing, producing, and delivering the products that best fit the market demand that changes over time are those that survive, grow, and in turn produce the most successful spin-offs in a process that is evolutionary in nature (Boschma and Frenken 2011). Thus, clusters will exhibit creative destruction along Schumpeterian lines. Clusters are the result of historical location choices, and they develop along a path-dependent spatial process. The spillover of knowledge between different generations of firms in an industrial cluster can be said to be the result of the entrepreneurial spawning process. However, this spillover of knowledge and routines can be described as a passive process rather than as an active process. Spin-offs in a cluster in a sense inherit the industry-specific knowledge from their parent firms, which is a specific form of embedded knowledge spillovers. Furthermore, since cluster firms are cognitively and geographically proximate, knowledge will flow and spill over between cluster firms via social, business, and organizational networks. An interesting effect of this persistence in the location of clusters is that the cluster-specific knowledge due to the location behavior of spin-offs is kept within the cluster, making it difficult for new clusters to compete with established clusters, i.e., intercluster knowledge spillovers are kept at a low level. Regions with one or several industrial clusters may, given that the market, infrastructure, and institutional conditions are good enough, experience a process where existing clusters may give birth to new clusters in related industries which will increase regional industrial diversification. In this connection it is important to observe that knowledge flows and spillovers not associated with spin-offs from “parent” companies operate at a fine geographic scale and tend to attenuate sharply with geographical distance. The reason is that learning effects between individuals

Knowledge Flows, Knowledge Externalities, and Regional Economic Development

953

require a high degree of geographical proximity which normally is associated with neighborhoods. Observed regional effects are the outcome of a number of neighborhood effects operating in parallel, which implies that metropolitan regions normally gain more from knowledge externalities since they contain more neighborhoods. However, it is also important to observe that the microeconomic learning effects may operate at different spatial scales in different industries (Andersson et al. 2016)

5

Knowledge Externalities and Regional Economic Development: Policy Conclusions

Regional economic development seems increasingly to be dependent upon internal network structures to exploit fully the internal knowledge base, while at the same time absorbing new knowledge from outside the region. Given the potentially large importance of knowledge spillovers for regional (and also national) economic development, policymakers have strong incentives to partly shift their traditional focus on the share of gross regional product (GRP) and gross domestic product (GDP) invested in R&D to measures that foster knowledge spillovers, intra-regional as well as inter-regional. Concerning stimuli to intra-regional knowledge spillovers, policymakers should first focus on increasing intra-regional accessibility by means of investments in transport infrastructure and public transport. It is also important to establish arenas and meeting places where economic agents in the region can meet and interact. Even if one should not underestimate the importance of intra-regional knowledge spillovers, it seems important to promote inter-regional knowledge spillovers. There are certainly many channels for inter-regional knowledge flows. Starting with academic channels, we claim that it is important to involve scientists from other regions and countries in advanced research programs and to encourage scientists to engage in inter-regional and international cooperation and co-authorships as well as being guest researchers at research universities and laboratories in other countries. With a rapidly increasing number of patents and academic publications, it becomes more and more important to support inventors in all fields with rapid updates on new patents and publications. Small- and medium-sized firms might need support when buying patents and licenses from other regions as well as in finding the right consultants in other regions. A positive and supportive attitude to strategic R&D cooperation with economic agents in other regions is also important. The role of imports of high-technology and knowledge-intensive products for inter-regional and international knowledge flows is often neglected. For regions that want to maximize the potential for inter-regional knowledge spillovers, it is important to create good conditions for import activities, not least through a high inter-regional accessibility. The same is true for direct investments from other regions. Multinational firms play an important but probably underestimated role for inter-regional knowledge spillovers. Another important channel for inter-regional knowledge flows is the in-migration of highly educated and skilled labor. Not least is it important to create

954

C. Karlsson and U. Gråsjö

an institutional framework that makes it easy for in-migrants to find information about available jobs, to apply for jobs, to find housing, etc.

6

Conclusions

There exists today a large literature on knowledge flows mainly using the concept knowledge spillovers without often making any distinction between intended and non-intended knowledge flows. The literature is not clear of the relative role of intended, i.e., market based, versus non-intended, i.e., spillovers, knowledge flows. This distinction is important, since it is only the latter type, which generates any technological knowledge externalities. The discussion of knowledge externalities in the literature has had a focus on technological spillovers, but it is obvious that there also exist pecuniary knowledge externalities. Thus, there is a substantial confusion in the literature about the economic nature of knowledge externalities. A similar confusion concerns the sources of knowledge spillovers. Many empirical studies have analyzed the role of patent citations, but it is obvious that economic agents learn from many other sources, but patents. Another controversy concerns who benefits from knowledge spillovers. Is it mainly economic agents in the same industry or is it mainly economic agents in other industries? In terms of the mechanisms and channels for knowledge spillovers, it is obvious that learning takes place through many channels but that most studies only consider one channel at a time. A hot topic is the spatial reach of knowledge spillovers. It is a common statement in the literature that knowledge spillovers are bounded in geographical space, a statement that needs to be seriously questioned in the time of the Internet with global air connections, satellite TV channels, and a high volume of international trade, direct investments, travelling, and labor mobility. Furthermore, the consequences of knowledge spillovers and knowledge externalities are not well understood, disentangled, and analyzed. This state of the art certainly complicates analyses of the role of knowledge spillovers and knowledge externalities for regional economic development. A central problem here is that most of the studies are done at an aggregated level assuming that there is only one type of knowledge and that all economic agents are equal and benefitting to the same extent from knowledge spillovers. This implies that researchers in the field today have all too little understanding to guide policymakers. What we need for the future is multilevel studies based upon microeconomic data for economic agents, who recognize that economic agents and regions are heterogeneous and that different economic agents use diverse types of knowledge, use different knowledge channels, differ in their absorptive capacity, etc.

7

Cross-References

▶ Cities, Knowledge, and Innovation ▶ Clusters, Local Districts, and Innovative Milieux ▶ Generation and Diffusion of Innovation

Knowledge Flows, Knowledge Externalities, and Regional Economic Development

955

▶ Knowledge Spillovers Informed by Network Theory and Social Network Analysis ▶ Networks in the Innovation Process ▶ Systems of Innovation and the Learning Region ▶ The Geography of Innovation ▶ The Geography of R&D Collaboration Networks

References Andersson ÅE, Beckmann MJ (2009) Economics of knowledge: theory, models and measurement. Edward Elgar, Cheltenham Andersson ÅE, Mantsinen J (1980) Mobility of resources: accessibility of knowledge and economic growth. Behav Sci 25:353–366 Andersson M, Klaesson J, Larsson JP (2016) How local are spatial density externalities? Neighbourhood effects in agglomeration economies. Reg Stud 50:1082–1095 Antonelli C (2013) Knowledge governance, pecuniary externalities and total factor productivity growth. Econ Dev Q 27:62–70 Audretsch DB, Feldman MP (1999) Innovation in cities: science-based diversity, specialization and localised competition. Eur Econ Rev 43:409–429 Boschma RA, Frenken K (2011) Technological relatedness and regional branching. In: Bathelt H, Feldman MP, Kogler DF (eds) Dynamic geographies of knowledge creation and innovation. Routledge, London, pp 64–81 Boschma RA, Martín V, Minondo A (2017) Neighbour regions as the source of new industries. Pap Reg Sci 96:227–245 Braunerhjelm P, Ding D, Thulin P (2018) The knowledge spillover theory of intrapreneurship. Small Bus Econ 51:1–30 Breschi S, Lissoni F (2001) Localized knowledge spillovers vs. innovative Milieux: knowledge tacitness reconsidered. Pap Reg Sci 80:255–273 Caragliu A, de Dominicis L, de Groot HLF (2016) Both Marshall and Jacobs were right! Econ Geogr 92(1):87–111 Chambers RG (1988) Applied production analysis: a dual approach. Cambridge University Press, Cambridge Cohen WM, Levinthal DA (1989) Innovation and learning: the two faces of R&D. Econ J 99:569–596 Duranton G, Puga D (2004) Micro-foundations of urban agglomeration economies. In: Henderson JV, Thisse J-F (eds) Handbook of regional and urban economics, vol 4. Elsevier, Amsterdam, pp 2063–2117 Frenken K, Cefis E, Stam E (2015) Industrial dynamics and clusters: a survey. Reg Stud 49:10–27 Glaeser EL et al (1992) Growth of cities. J Polit Econ 100:1126–1152 Griliches Z (1992) The search for R&D spillovers. Scand J Econ 94(Suppl):29–47 Grossman GM, Helpman E (1991) Quality ladders and product cycles. Q J Econ 106:557–586 Huggins R, Thompson P (2015) Entrepreneurship, innovation and regional growth: a network theory. Small Bus Econ 45:103–128 Jacobs J (1969) The economy of cities. Random House, New York Jaffe A, Trajtenberg M, Henderson R (1993) Geographic localization of knowledge spillovers as evidenced by patent citations. Q J Econ 108:577–598 Johansson B (2005) Parsing the menagerie of agglomeration and network externalities. In: Karlsson C, Johansson B, Stough RR (eds) Industrial clusters and inter-firm networks. Edward Elgar, Cheltenham, pp 107–147 Johansson B, Lööf H (2014) Innovation strategies combining internal and external knowledge. In: Antonelli C, Link AD (eds) Routledge handbook of the economics of knowledge. Routledge, London, pp 29–52

956

C. Karlsson and U. Gråsjö

Karlsson C, Manduchi A (2001) Knowledge spillovers in a spatial context – a critical review and assessment. In: Fischer M, Frölich J (eds) Knowledge, complexity and innovation systems. Springer, Heidelberg, pp 101–123 Kemeny T, Storper M (2015) Is specialization good for regional economic development? Reg Stud 49(6):1003–1018 Klepper S (2007) Disagreements, spinoffs and the evolution of Detroit as the capital of the U.S. Automobile Industry. Manag Sci 53:616–631 Krugman PR (1991) Geography and trade. Leuven University Press, Leuven Lundvall B-Å (ed) (1995) National systems of innovation – towards a theory of innovation and interactive learning. Biddles, London Miguéleza E, Moreno R (2015) Knowledge flows and the absorptive capacity of regions. Res Policy 44:833–848 Neffke F, Hartog M, Boschma R, Henning M (2018) Agents of structural change: the role of firms and entrepreneurs in regional diversification. Econ Geogr 94(1):23–48 Paci R, Marrocu E, Usai S (2014) The complementary effects of proximity dimensions on knowledge spillovers. Spat Econ Anal 9(1):9–30 Palander T (1935) Beiträge zur Standorttheorie. Almqvist & Wicksell, Uppsala Puga D (2010) The magnitude and causes of agglomeration economies. J Reg Sci 50:203–219 Romer P (1990) Endogenous technological change. J Polit Econ 98:71–102 Roper S, Love JH, Bonner K (2017) Firms’ knowledge search and local knowledge externalities in innovation performance. Res Policy 46(2017):43–56 Torre A, Rallet A (2005) Proximity and localization. Reg Stud 39:47–59

Knowledge Spillovers Informed by Network Theory and Social Network Analysis Maryann P. Feldman and Scott W. Langford

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Knowledge Spillovers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Application of Network Theory to Knowledge Spillover Research . . . . . . . . . . . . . . . . . . . . . . . 4 Conclusions and Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

958 959 964 966 969 969

Abstract

Researchers have applied network theory, and social network analysis methods to address questions developed within the field of knowledge spillovers. This research has the potential to illuminate the channels and mechanisms related to knowledge flows, especially as they relate to geography and the geographic concentration of innovation. This chapter has three objectives. First, to review prior findings within the field of knowledge spillovers, with a focus on existing findings that could be informed by network theory and social network analysis. Second, to review studies that have applied network theory and social network analysis methods to knowledge spillovers, and related fields. Finally, to suggest future research directions within the field of knowledge spillovers that can be informed by network theory and social network analysis methods. Keywords

Network theory · Knowledge spillovers · Social network analysis

M. P. Feldman (*) · S. W. Langford Department of Public Policy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA e-mail: [email protected]; [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_110

957

958

1

M. P. Feldman and S. W. Langford

Introduction

A researcher’s success is based on their ability to develop new knowledge. In turn, their ability to develop new knowledge is dependent on their ability to work together and exchange knowledge. Knowledge spillovers have been defined multiple ways; however, the consensus is that they are difficult to measure and capture, but have real economic impacts (see also chapter ▶ “Knowledge Flows, Knowledge Externalities, and Regional Economic Development” by Karlsson and Grasjö in this volume). Our conception follows Griliches (1992, p. 36), who described knowledge spillovers as “ideas borrowed by research teams of industry i from the research results of industry j”. At the heart of our conception of knowledge spillovers is the transfer of knowledge between two agents, working on similar things. The predominant benefits of knowledge spillovers likely involve tacit information. Tacit information is a discrete quanta of information that cannot be transmitted through simple communication in relatively complete form without loss of integrity. Within the field of knowledge spillovers, researchers have identified the actors, which are either individuals or teams of individuals that send and receive information. While communication occurs at the individual level, researchers often conduct analysis with the aggregated forms of these actors, such as firms or regions, which can be more readily observed. Further, researchers have identified the channels through which knowledge spillovers flow, such as labor mobility and informal contacts. While prior reviews (Feldman 1999, Feldman and Kogler 2010) examined the units of observation, such as patents, and the channels through which knowledge spillovers flow, such as labor mobility, this chapter focuses on how network theory and social network analysis can be applied to knowledge spillover research. For a discussion of the role of networks in the overall innovation process, see chapter ▶ “Networks in the Innovation Process” by Tranos in this handbook. For a more extensive discussion of social network analysis, see chapter ▶ “Social Network Analysis” by Medina and Waters in this handbook. Other methods have been used to examine knowledge spillovers, including econometrics and case studies. These methods have contributed significantly to our understanding of knowledge spillovers. Our position here is that network theory and social network analysis are complimentary to other methods, rather than superior. Whereas econometric analyses require large sample sizes, and case studies require small sample sizes, network theory can be conducted over a range of sample sizes. To apply network theory and social network analysis to knowledge spillover research, researchers have examined the relationship between distinct pieces of knowledge, constructed network structures of industries and international trade, and examined the effect of network structure and position on knowledge generation. Researchers have used indirect outcomes to measure the effect of network position on knowledge spillovers, or have used the theoretical lenses of network theory to interpret observations made using social network analysis. Looking forward, the field of knowledge spillovers would benefit from using network theory to construct hypotheses that can be tested using observations made with social network analysis methods and more direct measures of knowledge spillovers.

Knowledge Spillovers Informed by Network Theory and Social Network Analysis

959

The remainder of this chapter proceeds as follows. In Sect. 2, we provide a more extensive description of knowledge spillovers, and review prior theoretical and empirical contributions to our understanding of knowledge spillovers. In Sect. 3 we review how network theory and social network analysis have been applied to knowledge spillover research. Finally, in Sect. 4 we consider future directions for the application of network theory and social network analysis to knowledge spillover research, as well as present conclusions.

2

Knowledge Spillovers

Knowledge spillovers are inherently difficult to observe, as knowledge transfer is embodied in conversations, face-to-face interactions and chance serendipitous exchanges between individuals. While it is difficult to directly observe interactions between individuals, it is easier to observe aggregated units, such as firms or universities, through formalized mechanisms such as a strategic alliances. Although directly observing knowledge generation and transfer is difficult, as knowledge is transformed into economically valuable goods through the innovation process, it can be observed at multiple points, such as publication in an academic journal. The innovation process represents the pathway from knowledge generation to economically valuable product. At the first step, discovery, an individual or team of individuals goes through the cognitive process of generating new knowledge. After discovery, the individual or team of individuals may seek to have the knowledge published in an academic journal, or protected via a patent. Publication in an academic journal indicates that the knowledge generated is sufficiently novel to warrant publication, however this knowledge may or may not be economically valuable. Patents represent invention, or the point at which an idea has been recognized as potentially economically valuable and novel, however has not yet been transformed into a product. In the final step of the process, the generated knowledge is transformed into an economically valuable good or innovation. Firms test and validate the initial knowledge, eventually developing a product. Each of these artifacts can be observed. To clarify the distinction between patent activity and new product introductions, here we will refer to measures of patent activity as invention activity, whereas measures of new product introductions will be referred to as innovation activity. To illustrate the innovation process, consider an academic scientist conducting research with commercial potential. When designing experiments, the researcher identifies knowledge that is both sufficiently novel to expand their knowledge base, and sufficiently similar to be useful. This knowledge is identified through transfer mechanisms, such as consulting journals, attending conferences and informal social contacts. When the idea is developed to the point where it is sufficiently novel to contribute substantially to the existing body of knowledge, the researcher may seek to have the research published in an academic journal. The citations the author selects provide an artifact to observe the previously described knowledge transfer mechanisms. When the researcher develops the idea to the point that it is potentially

960

M. P. Feldman and S. W. Langford

economically valuable, the researcher may file for a patent. Again, the citations provided in the patent application provide an artifact of the previously described knowledge transfer mechanisms. Notably, each of these artifacts are imperfect, as not all information transferred will be observed through these artifacts. After the patent is successfully obtained, the scientist then transfers the patent to a firm. This firm then validates the idea underlying the patent, and adds value by developing the idea into a novel product – a final artifact of the innovation process. To illustrate the concept of knowledge spillovers, consider the scenario where the researcher attends an academic conference. The researcher discusses recent results with a colleague, and learns of a potential solution to a long-standing problem. Assuming that the colleague was not compensated for this information, a knowledge spillover has occurred. Here the actors are the colleagues, and the channel through which knowledge was transferred is informal communication. Although it is difficult to directly observe this knowledge transfer, the artifacts of the transfer can be observed through article and patent citations. In order for knowledge transfer to occur between two agents, two conditions are required. First, the knowledge transferred must be sufficiently similar to the existing knowledge base to be useful to the receiving agent, yet sufficiently novel to add to their existing knowledge base. Alternatively phrased, the cognitive proximity of the received knowledge and existing knowledge base must meet specific conditions. Cognitive proximity places knowledge into a spatial framework, where knowledge that is more similar is closer, while knowledge that is less similar is farther apart. In order for knowledge to be transferred, it must be within a range of distances, such that it is both sufficiently novel and useful. Second, the two agents must come into contact with one another. This contact could occur through either personal or professional mechanisms. Here geographic proximity is not required. However, individuals who come into contact with one another are often physically close, or have been previously. Thus, geographic proximity is often also observed. At various points along the innovation process, data artifacts, such as academic publications, patents and novel products, can be observed. These provide measures of a given step in the process. Patents have been used extensively, although several limitations are well-documented. The primary limitation of patents is that they reflect inventive activity, rather than innovative activity. In short, not all patents become innovations, while not all innovations are derived from knowledge that has been patented. To this point, Feldman (1994) observed that the correlation between the location of new products introduced to the market and broad patent categories is 0.8. New product introductions also have limitations as data artifacts. The primary limitation here is that the process for identifying and validating novel products is not as generally available. Thus, innovations are less frequently used to observe knowledge spillovers. The conception of knowledge spillovers has evolved to consider different types of proximity. Proximity describes the distance between two points. Here, three conceptions of proximity have been utilized, geographic, social and cognitive, to describe corresponding distances in space, relationships and knowledge. Initial research established the role of geographic proximity. This research established

Knowledge Spillovers Informed by Network Theory and Social Network Analysis

961

that the probability of the transfer of knowledge between agents increases as the distance between agents decreases. Subsequent research showed that social proximity drives this effect. Social proximity describes the distance between individuals in terms of relationships. For example, two individuals that are currently employed at the same firm, and work together would be considered socially proximate. This stream of research indicates that information is more likely to be transferred between individuals who come into contact with one another socially or interact due to serendipity. Often individuals who are socially proximate are also geographically proximate. Thus, the observed geographic proximity effect was driven by social proximity. Finally, researchers have examined the role of cognitive proximity. This uses a physical framework to describe the relationship between pieces of knowledge. To illustrate, consider the sets of knowledge required to design various products, a chair, a table and a pharmaceutical product. The cognitive proximity, or distance between the knowledge required to design the chair and table is less than the distance between the knowledge required to design the chair and pharmaceutical product. This research stream shows that the probability that a region enters an industry increases with prior experience with similar knowledge. In sum, as the field of knowledge spillovers has evolved, our conception of proximity has shifted. At this point, the conditions necessary for knowledge spillovers have been established as cognitive proximity, and social proximity, the latter of which is frequently associated with geographic proximity. Here, we outline the channels through which knowledge spillovers flow, and the artifacts used to observe the knowledge transferred through each channel. Initial research provided evidence of the geographic localization of knowledge spillovers, although the channel through which these spillovers flowed was unclear. Jaffe and colleagues (1993) use patent citations to show that knowledge spillovers are geographically localized. That is, the probability that a given patent cites another patent increases as the physical distance between the patent locations decreases. The primary implication of this research is that patent assignees are more aware of local patents, however it is unclear if this occurs through social contact, or another mechanism. Subsequent studies have further refined these results. Alcácer and Gittelman (2006) make the observation that patent citations are added by both the patent assignee, as well as examiner. Given this, it is not clear that the observed geographic localization is caused by the patent assignee. The authors show that examiners add the majority of citations, and these additions may bias or overinflate the significance of prior results. Thompson and Fox-Kean (2005) make the observation that Jaffe and colleagues (1993) constructed their control based on the relatively broad 3-digit United States Patent and Trademark Office classification. Thus, intra-class heterogeneity may have caused the observed geographic effect. To investigate this potential alternative mechanism, Thompson and Fox-Kean (2005) constructed a more refined control, which uses the technology subclass. This eliminates intra-national effects, although inter-national effects remain. Further studies provided evidence that labor mobility is the channel through which knowledge flows, and the driving force of the observed geographic

962

M. P. Feldman and S. W. Langford

localization. Agrawal et al. (2006) present evidence that patent assignees disproportionately cite patents in locations where they were previously located. This effect was particularly strong for cross-field knowledge spillovers. This suggests that knowledge spillovers occur through either social ties or labor mobility. Research conducted by Breschi and Lissoni (2009) indicates that after controlling for inventor mobility, which results in co-invention networks, the effect of geographic proximity on knowledge diffusion is reduced. In sum, this stream of research established that knowledge spillovers are geographically localized, and one mechanism that drives this geographic localization is labor mobility. This highlights the usefulness of applying social network analysis techniques to further refine observations made within the field of knowledge spillovers. In the international context, researchers examined Chinese returnee entrepreneurs, showing that individuals who return to China have a positive effect on firm invention and innovation activity. Liu and colleagues (2010) observed that the presence of returnee entrepreneur and multinational enterprise employee mobility each have a positive effect on firm invention activity. Filatotchev and colleagues (2011) show that returnee entrepreneurs have a spillover effect on local firms in terms of both inventive and innovative activity. This effect is moderated by the non-returnee firm’s absorptive capacity. The evidence provided here suggests that returnee entrepreneurs serve as a channel through which knowledge spillovers flow – this is a substantial contribution to the literature. A limitation here is that while returnee entrepreneurs are identified as the channel through which knowledge spillovers flow, data limitations preclude identification of the precise mechanism. It is implied that returnee entrepreneurs transfer knowledge from their foreign firm to the firm located in the country they return to. However, these individuals may impact firm invention and innovative activity through other mechanisms. For example, returnee individuals may have exceptional levels of human or social capital, causing the observed effect. Knowledge spillovers could also occur via informal mechanisms, such as social acquaintances. Dahl and Pedersen (2004) surveyed engineers at wireless communication firms in Northern Denmark. The results of this survey revealed that a majority of employees have informal contacts with an employee of another firm within the cluster, and a sizable minority of these individuals acquire knowledge through these relationships that they take advantage of in their current role. This research provides direct evidence that informal contacts are a channel through which knowledge spillovers flow. It appears this channel has been investigated less thoroughly. This is likely due to the inherent difficulty of observing these interactions. This suggests informal contacts as a channel of knowledge spillovers may be a fruitful area for future research. A primary function of universities is to generate knowledge. Thus, universities have long been identified as agents of knowledge spillovers. Within the university individual researchers, or teams of researchers generate new knowledge. This knowledge is then transferred to agents outside of the university, such as private firms, through several mechanisms, including former students joining local firms, or researchers advising local firms. Information can also be transferred within the

Knowledge Spillovers Informed by Network Theory and Social Network Analysis

963

university through several mechanisms, including informal contacts or conference presentations. Here the observed data artifacts are primarily academic publications, though patents serve as a secondary data artifact. A research stream has examined the effect of universities on regional invention and innovation activity. Feldman (1994) concludes that the positive effect of university research and development expenditures on innovation activity is greater than its effect on invention activity, while the positive effect of industrial research and development expenditures on invention activity is greater than its effect on innovation activity. This appears to be driven by differential firm size effects. Furthermore, Adams (2002) presents results indicating that knowledge transfer between academic and industrial research and development laboratories and firms is more localized for innovations than patents. In sum, this stream of research indicates that universities have a positive and localized, though differential, effect on invention and innovative activity. Researchers have also used academic publication citations to provide descriptive analyses of academic fields. Although this research does not explicitly use the knowledge spillover conceptual framework, it does point towards the potential use of applying network theory and social network analysis to examine knowledge spillovers within the academic setting. Strategic alliances have also been identified as a channel through which knowledge spillovers flow. These are defined as relationships between firms conducted in a formal and co-operative manner, and involve inter-firm resource transfer or co-development (Phene and Tallman 2014). Owen-Smith and Powell (2004) apply social network analysis methods to formal inter-firm connections within the Boston biotechnology sector. They observe that membership in the loosely linked, coherent component of the network has a positive effect on invention activity, while betweenness centrality has a negative effect. While this does not provide direct evidence of knowledge spillovers, it does highlight the role that strategic alliances play in firmlevel invention activity. Foreign direct investment represents firms investing in other firms across international borders. Presumably, this investment is coupled with human capital and intellectual property investments, which transfer knowledge. Branstetter (2006) examined Japanese multinational corporations investing in the United States, and showed that foreign direct investment had a positive effect on the inventive activity of both United States and Japanese firms. This research provides indirect evidence of knowledge spillovers, as the study measured the effect of investment on inventive activity, rather than observing the knowledge transfer itself. International trade represents the transfer of goods between individuals across international borders. Although this occurs between individuals, it has been observed at the country level. Coe and Helpman (1995) conclude that both domestic and foreign research and development capital have a positive effect on the total factor productivity of a country. Further, the effect of foreign research and development capital is moderated by the openness of the country to foreign trade. Notably, the observed outcome here – economic activity – is an indirect measure of innovative activity. Further, subsequent research, which used Monte-Carlo

964

M. P. Feldman and S. W. Langford

simulations to construct a random set of trade patterns as a counter-factual, demonstrated that the counter-factual pattern generated similar results (Keller 1998). Thus, although this research stream initially pointed to international trade as a channel through which knowledge spillovers flowed, the supporting empirical evidence is mixed.

3

Application of Network Theory to Knowledge Spillover Research

The following section will review research that has applied network theory and social network analysis to knowledge spillover research. In case the reader is not familiar with network theory and social network analysis, Table 1 provides definitions of key concepts. For a more extensive discussion of social network analysis, see chapter ▶ “Social Network Analysis” by Medina and Waters in this handbook. Researchers have also examined the connections between types of knowledge using the principle of relatedness. Here nodes represent knowledge while the connections between knowledge represent the similarity of the knowledge. Knowledge that is more similar will have shorter distances between them. This concept is often applied to actors, such as regions. Actors with greater relatedness will have greater overlap in their knowledge bases. Several measures, including the co-export of products, inter-industry labor flow and common labor pools, have been utilized to observe this (Hidalgo et al. 2018). Hidalgo and colleagues (2007) use a proximity measure to construct a “product space” or a network of products. They observe the proximity between a country’s existing productive structure and a potential product, and use this to measure the effect of product proximity on the probability that a country will enter a productive arena. Rigby (2015) used patent citations to examine changes in regional knowledge bases, and shows that a metropolitan statistical area’s pattern of technological diversification is dependent on the existing knowledge structure. Here the network effect, or co-inventor knowledge flows from other cities, is weaker than the local cognitive proximity effect, or the relatedness of current knowledge in a region. In sum, researchers have demonstrated that cognitive proximity can be used to predict the emergence of a technology or product in a region. Strategic alliances have also been examined using social network analysis and patent citations. At the individual level, researchers have estimated the effect of network position on firm invention activity. At the network level, researchers have examined network structure, providing information on the efficiency of knowledge transfer. At the individual level, Ahuja (2000) examined firms within the chemical industry, concluding that direct and indirect ties have a positive effect on firm invention activity, while structural holes have a negative effect on invention activity. Fritsch and Kauffeld-Monz (2010) examine German organizations, including firms, universities and non-university research institutes that were members of regional innovation networks. This study indicates that strong ties have a stronger positive effect on knowledge transfer than weak ties, as distinguished by the degree of trust between

Knowledge Spillovers Informed by Network Theory and Social Network Analysis

965

Table 1 Definitions of network theory and social network analysis concepts Concept Network flow model Scale-free network

The strength of weak ties

Structural holes

Reachability Connectivity Distance

Degree centrality

Closeness centrality Betweenness centrality

Network density

Prestige

Definition Resources flow between nodes along paths. This model implies connection and disconnection wherein disconnection represents the distance between two nodes (Scott and Carrington 2011). A unique network structure where a set of highly connected central nodes cause the mean distance between any two nodes to be relatively short. Thus, most nodes within a network are relatively close, and information can be transferred efficiently (Barabasi and Bonabeau 2003). This concept rests on two premises. First, as the strength of a connection between two nodes increases, the probability that the nodes will share common connections increases. Second, ties between two previously unconnected networks have a higher probability of communicating novel information relative to ties within existing networks. Based on these premises, it follows that weak ties are more likely to be sources of novel information (Scott and Carrington 2011). This concept can be illustrated by the following scenarios. In each scenario a single node is surrounded by a cloud of other nodes. The cloud of nodes in the first scenario has less connections to each other relative to the second scenario. This concept predicts that the central node of the first scenario is more likely to acquire novel information relative to the second. This is made based on the premise that nodes that are less connected are more likely to have unique information relative to those that are more connected (Scott and Carrington 2011). Nodes are reachable if a path can be traced between them. In a directional network, node reachability is not reciprocal (Scott and Carrington 2011). This measures the strength of a connection as the number of pathways between two nodes (Scott and Carrington 2011). This provides a measure of the time it takes for information to travel between nodes, and is defined as the smallest number of connections between individuals. This measure can be constructed to take into account connection strength, and measure overall network structure (see also chapter ▶ “Spatial Network Analysis” by Andris and O’Sullivan in this handbook and Scott and Carrington 2011). This measures how much a node maintains contact with other nodes, independent of directionality. This reflects the ability of a node to acquire or disseminate knowledge efficiently (see also chapter ▶ “Spatial Network Analysis” by Andris and O’Sullivan in this handbook and Wasserman and Faust 1994). This measures the distance between a node and all other nodes within the network. This reflects the ability of a node to communicate quickly and efficiently with all other nodes (Wasserman and Faust 1994). This measures the extent to which a node occupies a position between pairs of nodes within a network. This reflects the degree to which a node may control the flow of information (see also chapter ▶ “Spatial Network Analysis” by Andris and O’Sullivan in this handbook and Wasserman and Faust 1994). Network density represents the proportion of all possible connections within a network that exist. This measure provides information on the speed with which information diffuses within a network (Scott and Carrington 2011). This reflects the extent to which a node receives direct ties relative to the extent to which they initiate few relations (Wasserman and Faust 1994).

966

M. P. Feldman and S. W. Langford

direct connections. In sum, this research stream indicates that node position, and connection strength each have an effect on knowledge transfer. Researchers have also examined the network structure, providing information on the efficiency of knowledge transfer. Dyer and Nobeoka (2000) observe that Toyota suppliers have mechanisms that induce network strength, such as inter-firm employee transfers. Further, they observe that information freely flows between firms. This suggests that as the overall network strength increases, the information flow between firms’ increases. Choe and colleagues (2013) examined the network structure and node positions within the organic photovoltaic cells field in the United States. Using patent citations, researchers constructed the network between firms, countries, and technology fields. This field can be characterized as scale-free, suggesting information is transmitted efficiently relative to a randomly distributed network (Barabasi and Bonabeau 2003). Choe and colleagues (2013) also identified the key nodes, by calculating degree centrality scores at each unit of analysis. Wang and colleagues (2017) examined how the network structure of the semi-conductor industry in the United States has evolved using firm-level centrality measures. Here collaborations were classified as strong ties and patent citations as weak ties. The authors suggest that the efficiency of knowledge spillover through weak ties is higher than strong ties due to the higher density of weak ties relative to strong. Thus, the authors conclude that weak ties, defined as patent citations, are more efficient than strong ties, defined as collaborations. Choe and colleagues (2013), as well as Wang and colleagues (2017) provide the first application of social network analysis to strategic alliances within the context of knowledge spillover theory – a significant contribution to the field. The authors suggest that the existing network structures are efficient, and that knowledge spillovers are transferred more efficiently through patent citations. Each of these are made based on predictions derived from network theory. In future studies it may be useful to test predictions made using network theory empirically. For example, network theory predicts that denser networks transfer information more efficiently. If this were true, one might expect that as network density increases, invention or innovation activity would increase as well. In the future, it may be fruitful to consider conducting studies that test this. Shih and Chang (2009) examined technology diffusion internationally using patent citations, and trade patterns, observing that these measures construct similar networks. Further, they classify countries based on whether they generated, received or both generated and received knowledge. A particular strength of this research is the use of directionality. This, coupled with the two connection measures (patent citations and trade patterns) enabled the authors to glean more detailed observations from their network. Directionality has previously been measured using prestige. Although this concept is not explicitly used here, this research highlights information that could be gained from considering prestige.

4

Conclusions and Future Research

Fundamentally, knowledge spillovers represent the transfer of knowledge from one agent to another who is working on similar topics, which results in economic gains. Knowledge spillover research established the key actors, and channels through

Knowledge Spillovers Informed by Network Theory and Social Network Analysis

967

which knowledge flowed, as well as indirect methods of knowledge transfer. Concurrently, the field of network theory blossomed, providing a theoretical basis to construct hypotheses, and methods to characterize node position, and network structure. At this point, studies have incorporated network theory and social network analysis into the field of knowledge spillovers to provide descriptive analysis of node position and network structure. Further, limited studies examined the effect of node position on invention activity and network structure on the efficiency of knowledge transfer. Looking forward, it may be helpful to construct studies based on panel datasets, that incorporate quasi-experimental design to estimate causal effects. Novel data collection methods, such as scraping of social media, also provide a wealth of detailed information not previously available. Given this, further incorporation of network theory and social network analysis methods into knowledge spillover research may be very fruitful. Thus far, within the context of knowledge spillovers, network theory and social network analysis have been applied to social and cognitive proximity. However, given that geographic proximity is often a pre-condition for social proximity, in the future it may be useful to conduct investigations using spatial network analysis (see chapter ▶ “Spatial Network Analysis” by Andris and O’Sullivan in this handbook). For example, these investigations could consider the effect of infrastructure changes on knowledge spillovers. Broadly, promising avenues of future research may involve measuring the effect of network structure and position on knowledge spillovers. Here, we organize our discussion of potential research avenues by units of analysis, specifically, the agent and the network. At the agent level, researchers have considered the impact of network position on knowledge generation. Looking forward, this stream of literature could be expanded in multiple ways. Thus far researchers have typically utilized a single social network analysis measure in their studies. In the future it may be beneficial to estimate the differential effect of various measures. For example, estimating the differential effect of betweenness and closeness on firm innovative activity may be instructive. Given that, within the context of knowledge spillovers, the acquisition of novel information is expected to be more valuable than the acquisition of a larger volume of information, measures that are theorized to reflect this (such as betweenness centrality) may be expected to better predict knowledge generation. To illustrate, consider using the strength of weak ties concept to construct hypotheses. As the number of bridging connections of a firm increases, it is expected that the probability of acquiring, and thus generating novel information increases. Further, here the quality of the connected actors may also be of great interest. For example, it is expected that as the human capital of a researcher’s network increases, their ability to generate novel information increases. Thus, it may be important to take this into account. Therefore, a measure such as status prestige, which takes into account both directionality and the prestige of connections, may be of interest here. Hidalgo and colleagues (2007), as well as Rigby (2015), estimated the effect of knowledge relatedness, at the country and regional levels, on the probability that the agent generates knowledge in a related field. Given this, it may be fruitful to consider how this may be applied at lower units of observation, such as the firm or individual level.

968

M. P. Feldman and S. W. Langford

Researchers could also consider the effect of network structure on knowledge spillovers. For example, researchers could estimate the effect of the mean network distance on the volume of spillovers observed in a network. This requires the comparison of multiple networks, either one network over time, or multiple distinct networks. Thus far a limited number of studies have utilized panel datasets, which may be important for these studies. This also raises endogeneity concerns, and the necessity of incorporating quasi-experimental methods to attempt to overcome these issues. Notably, prior research has not measured the number of ties within and between groups of nodes within a network. Network theory proposes that novel information is more likely to be acquired from nodes external to an existing group. Thus, comparison of the number of connections between and within groups provides information on which nodes and groups are expected to receive, and subsequently generate, novel information. At this point in time, examination of the interaction of network position or structure and other characteristics, such as human capital, has been limited. Studies that incorporate these measures may allow for the comparison of relative effects, and observation of interaction effects. For example, two nodes may have similar network positions, suggesting that the volume and novelty of the information they will receive, and generate is similar. However, the human capital, or absorptive capacity of each node may differ. Thus, their ability to integrate the knowledge they receive into their existing knowledge base will differ. Further, the nodes they are connected to may have different characteristics, such that the information they are able to acquire is different. Two current trends help predict the future directions of data collection. First, the increasing availability of firm and individual-level data provide the ability to examine knowledge spillover questions at a less aggregated level. Second, the increased prevalence of social media, such as Twitter and LinkedIn, provide new opportunities to observe informal connections between individuals, such as common educational affiliations. As a whole, increasing data availability provides the opportunity to observe knowledge spillovers at a less aggregated level, and observe previously unobservable channels. At this point, due to various reasons, the data availability of innovations has been limited. Thus, in the future, it would be advantageous to collect innovation data, such that the innovation process can be more directly observed. Prior research primarily utilized a single data artifact to construct networks. However, Shih and Chang 2009 incorporated both patent citations and trade patterns into their analysis, providing the ability to examine how countries both generate and receive knowledge. In future research, it may be useful to take a similar approach. For example, at the individual level, one could layer information gleaned from social media onto formal measures of knowledge transfer, such as patent citations, to examine how social connections impact knowledge spillovers. This additional data would provide the ability to glean more detailed insights. Thus far considerable progress has been made within the field of knowledge spillovers. The primary agents that generate and transfer knowledge, and channels through which knowledge spillovers flow have been identified. Further, researchers

Knowledge Spillovers Informed by Network Theory and Social Network Analysis

969

have begun to utilize network theory and social network analysis methods to examine questions within the field. Given the advances in data collection, and the further application of network theory and social network analysis to knowledge spillover research, the future of the field looks bright.

5

Cross-References

▶ Knowledge Flows, Knowledge Externalities, and Regional Economic Development ▶ Networks in the Innovation Process ▶ Social Network Analysis ▶ Spatial Network Analysis

References Adams J (2002) Comparative localization of academic and industrial spillovers. J Econ Geogr 2(3):253–278 Agrawal AK, Cockburn IM, McHale J (2006) Gone but not forgotten: knowledge flows, labor mobility, and enduring social relationships. J Econ Geogr 6(5):571–591 Ahuja G (2000) Collaboration networks, structural holes, and innovation: a longitudinal study. Adm Sci Q 45(3):425–455 Alcácer J, Gittelman M (2006) Patent citations as a measure of knowledge flows: the influence of examiner citations. Rev Econ Stat 88(4):774–779 Barabasi A-L, Bonabeau E (2003) Scale-free networks. Sci Am 288:50–59 Branstetter L (2006) Is foreign direct investment a channel of knowledge spillovers? Evidence from Japan’s FDI in the United States. J Int Econ 68(2):325–344 Breschi S, Lissoni F (2009) Mobility of skilled workers and co-invention networks: an anatomy of localized knowledge flows. J Econ Geogr 9(4):439–468 Choe H, Lee DH, Seo IW, Kim HD (2013) Patent citation network analysis for the domain of organic photovoltaic cells: country, institution, and technology field. Renew Sust Energ Rev 26:492–505 Coe DT, Helpman E (1995) International R&D Spillovers. Eur Econ Rev 39(5):859–887 Dahl M, Pedersen C (2004) Knowledge flows through informal contacts in industrial clusters: myth or reality? Res Policy 33(10):1673–1686 Dyer JH, Nobeoka K (2000) Creating and managing a high performance knowledge-sharing network: the Toyota case. Strateg Manag J 21(3):345–367 Feldman MP (1994) The geography of innovation. Kluwer Academic, Boston Feldman MP (1999) The new economics of innovation, spillovers and agglomeration: a review of empirical studies. Econ Innov New Technol 8(1–2):5–25 Feldman MP, Kogler DF (2010) Stylized facts in the geography of innovation. In: Hall BH, Rosenberg N (eds) Handbook of the economics of innovation. Elsevier, New York, pp 381–410 Filatotchev I, Liu X, Lu J, Wright M (2011) Knowledge spillovers through human mobility across national borders: evidence from Zhongguancun Science Park in China. Res Policy 40(3): 453–462 Fritsch M, Kauffeld-Monz M (2010) The impact of network structure on knowledge transfer: an application of social network analysis in the context of regional innovation networks. Ann Reg Sci 44:21–38 Griliches Z (1992) The search for R&D spillovers. Scand J Econ 94(Supplement):29–47

970

M. P. Feldman and S. W. Langford

Hidalgo CA, Balland P, Boschma R, Delgado M, Feldman M, Frenken K, Glaeser E, He C, Kogler DF, Morrison A, Neffke F, Rigby D, Stern S, Zheng S, Zhu S (2018) The principle of relatedness. In: Morales AJ, Gershenson C, Braha D, Minai AA, Bar-Yam Y (eds) Unifying themes in complex systems IX. Springer, New York, pp 451–457 Hidalgo CA, Klinger B, Barabási AL, Hausmann R (2007) The product space conditions the development of nations. Science 317(5837):482–487 Jaffe A, Trajtenberg M, Henderson R (1993) Geographic localization of knowledge spillovers as evidenced by patent citations. Q J Econ 108(3):577–598 Keller W (1998) Are international R&D spillovers trade-related? Analyzing spillovers among randomly matched trade partners. Eur Econ Rev 42(8):1469–1481 Liu X, Lu J, Filatotchev I, Buck T, Wright M (2010) Returnee entrepreneurs, knowledge spillovers and innovation in high-tech firms in emerging economies. J Int Bus Stud 41(7):1183–1197 Owen-Smith J, Powell WW (2004) Knowledge networks as channels and conduits: the effects of spillovers in the Boston biotechnology community. Organ Sci 15(1):5–21 Phene A, Tallman S (2014) Knowledge spillovers and alliance formation. J Manag Stud 51(7):1058–1090 Rigby DL (2015) Technological relatedness and knowledge space: entry and exit of US cities from patent classes. Reg Stud 49(11):1922–1937 Scott J, Carrington PJ (eds) (2011) The SAGE handbook of social network analysis. SAGE Publications, London Shih H-Y, Chang T-LS (2009) International diffusion of embodied and disembodied technology: a network analysis approach. Technol Forecast Soc Chang 76(6):1–45 Thompson P, Fox-Kean M (2005) Patent citations and the geography of knowledge spillovers: a reassessment. Am Econ Rev 95(1):450–460 Wang C-C, Sung H-Y, Chen D-Z, Huang M-H (2017) Strong ties and weak ties of the knowledge spillover network in the semiconductor industry. Technol Forecast Soc Chang 118:114–127 Wasserman S, Faust K (1994) Centrality and prestige. In: Social network analysis: methods and applications. Cambridge University Press, Cambridge, UK, pp 169–219

Clusters, Local Districts, and Innovative Milieux Michaela Trippl and Edward M. Bergman

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Core Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 The Marshallian Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The Neo-Marshallian Industrial District Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 The Innovative Milieu Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 The Industrial Cluster Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Similarities and Dissimilarities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Geographies and Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Actors and Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Industries and Innovation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Environment, Competition, and Cooperation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

972 973 973 975 977 979 980 980 982 983 984 986 987

Abstract

Over the last three decades, literature on industrial districts, innovative milieus, and industrial clusters has enriched our knowledge about endogenous factors and processes driving regional development and the role of the region as an important level of economic coordination. This class of stylized development concepts has emerged since the 1970s and attempts to account for successful regional adaptations to changes in the global economic environment. Each of these concepts grew out of specific inquiries into the causes of economic success to be found in

M. Trippl (*) Department of Geography and Regional Research, University of Vienna, Vienna, Austria e-mail: [email protected] E. M. Bergman Institute for Multi-Level Governance and Development, Vienna University of Economics and Business, Vienna, Austria e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_26

971

972

M. Trippl and E. M. Bergman

the midst of general decline by building upon the early ideas of Alfred Marshall in several ways. Neo-Marshallian districts found in Italy highlight the importance of small firms supported by strong family and local ties, while the innovative milieu concept places great emphasis on the network structure of institutions to diffuse externally sourced innovations to the local economy. Clusters have become far more general in scope, fruitful in theoretical insights, and robust in application, informing the work of both academics and policymakers around the world. Keywords

Small firm · Knowledge spillover · Industrial district · Industrial cluster · External economy

1

Introduction

When one first becomes interested in the growth or development implications for local or regional collections of firms and industries, a bewildering array of possibilities presents itself. One learns that external economies of scale accrue to firms that colocate and thereby stimulate growth, for example, pecuniary savings and intangible flows of information arise automatically from agglomeration benefiting resident firms. External economies of scale are defined as economies that are external to the firm but internal to the region. External economies of scale can be divided into localization economies (benefits from colocation accruing to firms operating in the same industry) and urbanization economies (benefits from collocation accruing to firms operating in different industries). A robust and widely ranging literature of this tradition does exist, guiding the theoretically inclined or those who wish to examine empirical tests of important propositions. Then there are the development practitioners, academics, or students who rely upon a family of more stylized models to formulate development prospects. Such models frequently adopt the concepts that are most widely accepted or formulated in their home territories, the bulk of which date from the 1970s. The emergence of these concepts can be documented by Google Ngrams for books published in English between 1800 and 2000 (http://books.google.com/ngrams). The 1970s was a period in which traditional production methods and their centers lagged but more peripheral areas, focused increasingly on better or higher technologies and market niches, began to prosper from the bottom-up. Italians are quite familiar with “neoMarshallian industrial districts,” while the French and many Swiss prefer insights drawn from “innovative milieu,” and contemporary English or German speakers – and many others – are most comfortable with the ideas borne of “industrial clusters.” The concepts discussed in this chapter all go beyond a pure economic view on agglomerated industries (i.e., the argument of external economies of scale), drawing – although to varying degrees – attention to social and institutional factors that allow for coordination of economic actors. The preferred notions arose in specific contexts

Clusters, Local Districts, and Innovative Milieux

973

and circumstances, but are they essentially similar or are there elemental differences worth emphasizing? In this chapter, we shall focus on both the common and distinctive elements of these concepts. To get under way, we shall first examine the core foundation on which each of these rests, after which representative definitions will be drawn from the contemporary literature to distinguish between Marshallian and neo-Marshallian industrial districts, innovative milieux, and industrial clusters. We will then compare these with the definitions found in the emergence of their seminal literatures. Primary attention will be paid to features that reveal common overlaps and how each addresses these points, while also noting uniquely specific features that clearly distinguish among them. Finally, we will comment on challenges for future research and application.

2

Core Concepts

2.1

The Marshallian Concept

The English economist Alfred Marshall is seen as the “father” of the industrial district concept, and most contemporary work on agglomeration, localization, clustering, and the innovation-enhancing effects related with the geographical concentration of firms is explicitly or at least implicitly based on his writings. The key element in Marshall’s theorizing about “localized industries” (defined as industries concentrated in certain localities) is the notion of external economies of scale (see section “Introduction”). It is these effects that enable small firms colocated with others in a district to compete successfully with large vertically integrated firms, which take advantage of internal economies of scale (i.e., benefits of large-scale production). There were ample examples of such geographical settings in the late 1800s in Britain such as cotton and textile in Lancashire, cutlery in Sheffield, pottery in Staffordshire, or straw plaiting in Bedfordshire. Marshall’s writings contain both explanations for the rise of localized industries and their long-term “anchoring” in districts. According to Marshall, the initial localization of industries can have many sources such as the availability of raw material, the demand for goods of high quality or the immigration of people with specialized skills. However, once an industry is spatially concentrated in a particular locality, a set of agglomeration forces keeps it in place: “When an industry has thus chosen a locality for itself, it is likely to stay there long: so great are the advantages which people following the same skilled trade get from near neighbourhood to one another” (Marshall 1920, p. 225). These advantages are threefold, and they can be conceptualized as positive external economies of scale: Knowledge spillovers: Firms benefit from local knowledge circulation and manifold opportunities for monitoring, learning from, and imitation of innovative actions set by colocated firms. Marshall (1920, p. 225) notes: “The mysteries of trade become no mysteries; but are as it were in the air, and children learn many of them

974

M. Trippl and E. M. Bergman

unconsciously. Good work is rightly appreciated, inventions and improvements in machinery, in processes and in general organization of the business have their merits promptly discussed: if one man starts a new idea, it is taken up by others and combined with suggestions of their own; and thus it becomes the source of further new ideas.” Rise of supplier industries: Colocated firms operating in the same industry also take advantage from the growth of specialized supplier industries in their neighborhood, supplying the localized industry with raw materials, intermediate products, and services. Labor market effects: Localization of industries promotes the emergence of a highly specialized labor market that attracts both firms and workers. Employers benefit from the ready availability of highly skilled workers with the required special qualifications, and workers can take advantage of the rich employment opportunities, allowing them to find adequate jobs rather easily. According to Marshall, the spatial concentration of firms of moderate size in districts was a widespread phenomenon – it was the typical form of industrial organization – of the English economy in the late nineteenth century, that is, a phenomenon that could be observed in many different industries and not merely a few. Marshall’s formulation of the industrial district concept contains a profound analysis of the linkages that knit collocated small firms together. His work points to knowledge spillovers, producer-supplier relations, and labor mobility among district firms and conceptualizes such relations as key factors underpinning the innovation capacity of localized industries. Marshall’s work went beyond pure economic factors and considered also sociocultural and institutional assets to explain the economic vibrancy of industrial districts. A key notion in his writings is that of an “industrial atmosphere,” that is, the presence of a collective identity and shared industrial expertise that gradually develops in industrial districts and facilitates interaction, localized knowledge circulation, and the creation and diffusion of innovation. Marshall was also aware of the potential dangers of specialized local development, anticipating the downturn of many English districts in the first decades of the twentieth century as well as what has happened many years later in old industrial areas characterized by economic mono-structures (i.e., overspecialization in mature sectors such as shipbuilding, coal and steel industries). He considered districts that are dependent on one industry as being extremely vulnerable, pointing to the risk of depression, crisis, and decline in case of changing context conditions such as, for instance, a fall in demand for its products or changes in technology. In the era of mass production organized within large hierarchical firms, Marshall’s ideas lost importance. His work on localized external economies of scale was brought to new life from the 1960s onward. Kenneth Arrow (1962) and Paul Romer (1986) extended Marshall’s reasoning on knowledge spillovers as key source of innovation and growth. Such positive localization effects resulting from colocation of firms belonging to the same industry (intra-industry spillovers) are

Clusters, Local Districts, and Innovative Milieux

975

now known as Marshall-Arrow-Romer (MAR) externalities in regional science (Glaeser et al. 1992). They can and should be clearly distinguished from Jacobs externalities, that is, urbanization effects which arise from the presence of firms operating in different industries (interindustry spillovers). The MAR literature provides a pure economic view of agglomeration phenomena, decoupled from Marshall’s accounts of the supporting role of social and cultural factors referred to as “industrial atmosphere” in his writings. However, Arrow and Romer were not the only scholars who have rediscovered Marshall. In the 1970s and early 1980s, Italian researchers started to draw extensively on his ideas, extending rather than downplaying the weight of the sociocultural underpinning of localized specialized economic activity in their conceptual and empirical analyses.

2.2

The Neo-Marshallian Industrial District Concept

The end of the golden age of mass production and the crisis of the large hierarchically organized firm in the 1970s provoked a renewed interest in Alfred Marshall’s work on external economies of scale and industrial districts. His basic ideas were revitalized and further developed by a group of Italian researchers who studied the industrial development in the central and northeastern part of Italy (the so-called Third Italy with its main regions Emilia-Romagna, Tuscany, and Veneto). This group was dealing with a somewhat puzzling phenomenon, that is, the economic success of industries with seemingly outdated organizational forms (family-owned small companies) belonging to mature sectors with limited growth prospects (such as textiles, footwear, leather goods, furniture). Giacomo Becattini is one of the leading exponents of research on neoMarshallian industrial districts. He rediscovered Marshall’s concept of industrial districts and reformulated it for the specific context of the Third Italy. Becattini’s article “From the industrial ‘sector’ to the industrial ‘district’” (published in Italian in the journal Rivista di economia e politica industriale (Becattini 1979) and in English 1989 in a book edited by Edward Goodman and Julia Bamford) is usually considered as the “starting point” of scholarly work on neo-Marshallian industrial districts (Landström 2005). The international dissemination of the concept was essentially promoted by Michael Piore and Charles Sabel. In their book The Second Industrial Divide (1984), they drew heavily on the example of neo-Marshallian industrial districts in the Third Italy to support their thesis of a major transformation from the Fordist mass production toward a model of flexible specialization. Looking at the Tuscan economy, Becattini (2003, p. 17) noted: “To employ a concept much used by Alfred Marshall, the course of Tuscan history leads to a form, still incomplete but already clear in outline, of “industrial district” . . . which produces economies external to the single firm and even to the industrial sector defined by technology, but internal to the “sectorial-social-territorial” network.” Becattini and other protagonists of the neo-Marshallian industrial district notion (such as Gabi Dei Ottati, Sebastiano Brusco, Marco Bellandi, and Patrizio Bianci to name just a few) reject a purely economic view on local industrial growth and an

976

M. Trippl and E. M. Bergman

exclusive focus on the economic effects of agglomeration. They suggest a much broader perspective that takes into consideration the social, cultural, and institutional foundations of local development. Becattini has advanced the idea of neo-Marshallian industrial districts as complex socioeconomic settings and highlighted the relation between efficiency and competitiveness of production and the sociocultural conditions prevailing at the regional level. Becattini (1990, p. 36) defines neo-Marshallian industrial districts as “. . . a socio-territorial entity which is characterised by the active presence of both a community of people and a population of firms in one naturally and historically bounded area. In the district, unlike in other environments, such as manufacturing towns, community and firms tend to merge.” Neo-Marshallian industrial districts in the Third Italy are characterized by a set of common features (see also Landström 2005). First (and accordingly emphasized in the definition presented above), in these districts, production activities and the “daily life” tend to merge. This merger is the outcome of a pattern of intensive interactions between the community of firms and families in a variety of dimensions. Second, there is a high division of labor among small companies, which are specialized in specific phases of the production process and work together in flexible teams. The manufacturer of the final good (the so-called impannatore) usually leads these teams and interacts with the market. Third, a widely shared value system (e.g., a specific ethic of work, principles of reciprocity) shapes action and interaction in the local community. These values are diffused throughout the district by a dense network of institutions (such as the firm, the family, the church, the local government, business associations). Fourth, in Third Italy’s neo-Marshallian industrial districts, competition and cooperation tend to coexist. The firms are linked to each other by manifold relationships, competing in some fields and collaborating in other ones. Finally, neo-Marshallian industrial districts benefit from a credit system with locally and socially embedded banks, which have a deep knowledge about and close linkages to the community of the districts’ firms. From the 1980s onward, one could observe a rise in importance of local and regional policy interventions in Italian industrial districts. Policy-makers supported the evolution of districts by providing infrastructure (industrial parks, real service centers) and collective services (financing, education, marketing) and by performing the role of “social coordinators” (bringing actors together to solve common problems). The 1990s has seen a growing skepticism about the long-term competitiveness and growth dynamics of neo-Marshallian industrial districts. Radical technological innovation and the globalization of the economy have led many authors to raise doubts about the extent to which they are a superior and stable spatial configuration of industrial production (see, for instance, Guerrieri et al. 2001). Critics have pointed to the limitations of self-sustaining production systems dominated by small firms in the global economy. Furthermore, the attempt made by Piore and Sabel (1984) to draw from very unique (and hardly generalizable) cases such as the Third Italy to develop their notion of flexible specialization has been viewed with extreme skepticism (see, for instance, Storper 1997).

Clusters, Local Districts, and Innovative Milieux

977

Over the past years, a rich body of empirical work (for a review, see Rabellotti et al. 2009) emerged, documenting and analyzing ongoing structural changes in Italian districts. Based on a review of this literature, Rabellotti et al. (2009) found that some districts disappeared as a result of crises in their area of specialization (e.g., textile districts in Lombardy and Veneto). In most cases, these districts were specialized in low-cost production and failed to compete successfully with manufacturers in newly emerging countries. Other districts have changed specialization, shifting from the production of final goods to the production of the machinery required for their manufacture. There are also districts which have conquered international luxury markets by upgrading the quality of their products, while others have enhanced their technological capabilities. Interestingly, the typical small Italian industrial district firm seems to lose importance. Indeed, the evidence suggests that it is medium-sized companies and groups of firms which are now the most dynamic agents and perform as key driving forces of structural changes (Rabellotti et al. 2009).

2.3

The Innovative Milieu Concept

The decline of many traditional industrial centers throughout Europe in the 1970s coupled with the subsequent rise of technological advances in industry and the emergence of innovative peripheral regions prompted serious rethinking of centrally directed and supported industrial development poles, often French in origin, in favor of innovatively driven indigenous growth. In the mid-1980s, Philippe Aydalot’s hypothesized that there is “something” localized and intangible that permits innovation and dynamic development to proceed in certain regions and not in others (Crevoisier 2004). This observation set in motion a series of research efforts titled GREMI (Groupe de Recherche Européen sur les Milieux Innovateurs) to investigate and promote the dynamism seen in what became known as “innovative milieu.” The concept of innovative milieu, however, has never ventured much beyond Francophone readers. Research on milieus tends to be strongly focused on high-tech industries and on growth regions with extensive innovation intensity, although a few studies on conservative milieus in stagnating or declining traditional industrial regions and their restructuring processes also exist (see, for instance, Aydalot 1988; Maillat et al. 1997). Definitions of innovative milieu vary, but most protagonists of this concept share the view that a milieu may be described as a set of region-specific rules, practices, and institutions that enhances the capacity of regional actors to innovate and to coordinate with other innovative actors (Storper 1997). In the writings of the GREMI group, the milieu is not always clearly distinguished from networks. As Storper (1997, p. 17) notes: “Many of the milieu theorists use the ‘network’ as their principal organizational metaphor. For some, the milieu is itself a network of actors . . . in a region. For others, the network concerns the input-output system; it is the network

978

M. Trippl and E. M. Bergman

that is embedded in a milieu, and the milieu provides members of the network what they need for coordination, adjustment and successful innovation.” The directions taken by GREMI members reflect what Benko and Desbiens (2004, p. 325) “see as a genuine ‘territorial turn’ that can be characterized by a movement from economics toward geography; this renewal is influenced by the cultural turn. . .that has been expanded through the influence of traditional economics and sociology.” This reorientation has profound implications for how one looks at regional development. Accordingly, the focus moves away from MAR externalities enjoyed by individual economic units or agents and their industrial interactions to instead gravitate toward an examination of the full pattern of structural linkages among institutions that diffuse and enhance the innovative potential of a territory. Rather than product, supply, labor, or material flows among firms that constitute a local economy, one looks to the overarching structure of institutional networks through which innovations and ideas – often external in origin – pulse and bring prosperity to regions. For example, Camagni (1991a, p. 3) sees an innovative milieu as “the set, or the complex network of mainly informal social relationships on a limited geographical area, often determining a specific external ‘image’ and a specific internal ‘representation’ and sense of belonging, which enhance the local innovative capability through synergetic and collective learning processes. . . The attraction of external energies and know-how is exactly the objective we assign to innovative networks: through formalized and selective linkages with the outside world. . .local firms may attract the complementary assets they need to proceed in the economic and technological race.” This network is not seen as passive or merely contextual, but rather as a robust, proactive, and enabling agent-like presence: “. . . it is often the local environment which is, in effect, the entrepreneur and innovator, rather than the firm. The firm is not an isolated agent of innovation: it is one element within the local industrial milieu which supports it” (Aydalot and Keeble 1988, p. 9). Despite considerable research from subsequent GREMI investigations, the question of how desirable externalities arose initially remains unclear. Simmie (2005, p. 793) concludes from his survey of the literature that “Explanations slip all too easily into the argument that innovative milieu assist innovative firms while at the same time the presence of innovative firms create the innovative milieu that are supposed to be assisting them.” However one reads this literature, it would seem that firms and industries have become secondary or complementary components of a local economy, which rely upon the linked institutions of their innovative milieu to acquire key technological assets. Crevoisier and Maillat (1991) have traced some of the possible connections between the innovative milieu and the underlying markets, sectors, industrial organization, etc., giving rise to what has become more generally known as a territorial production system. However, the territorial production system (TPS) remains a largely unspecified “black box” of idealized categorical types (e.g., industrial organization), which does not clearly identify the agents, elements, or incentives that constitute a working local economy. Storper (1997) notes that one of the key shortcomings of the literature on innovative milieus is that it fails to identify the

Clusters, Local Districts, and Innovative Milieux

979

economic logic by which a milieu promotes regional innovation. Little is said about why localization and territorial specificity should promote technological dynamics. Camagni (1991b) goes further in seeing the innovative milieu as an uncertaintyreducing mechanism that permits firms to better assess and deploy innovative resources, although he does not refer to a TPS in specifying several gaps to be overcome. The innovative milieu concept first launched important ideas about how innovations are introduced and exploited in local economies, but it was eventually overtaken by other more convincing accounts of how innovation systems are populated and operate within regions.

2.4

The Industrial Cluster Concept

From the 1990s onward, the term “cluster” has become increasingly prevalent to denote spatial concentrations of firms. There are many different branches of this notion. As the term “cluster” is rather imprecise and meanings could differ, various authors have attached to it the adjectives “industrial,” “regional,” “business,” and “economic.” It is beyond the scope of this chapter to deal with all these variants (for an overview, see Iammarino and McCann (2006) and Cruz and Teixeira (2010)). In this chapter, we focus on Michael Porter’s cluster approach which has become the most popular one. In 1990, Porter published his highly influential book “The Competitive Advantage of Nations,” which introduced his industrial cluster concept broadly to business and policy officials, and he rapidly propelled awareness of the concept well beyond the previous academic audience in a subsequent outpouring of books and articles. Clusters are defined as “. . . geographic concentrations of interconnected companies, specialized suppliers, service providers, firms in related industries, and associated institutions (for example, universities, standards agencies, and trade associations) in particular fields that compete but also cooperate” (Porter 1998, p. 197). Porter claims that clusters are a dominant feature of the landscape, that is, they could be found in virtually all countries and industries. The concept is rather flexible as regards the types of firms involved. Some clusters are made up of small firms only, while other host both small and larger ones (Porter 1990). Unlike the previous descriptive accounts of colocation from which causal factors might be inferred, Porter brings both normative and positive insights to his cluster definition from long experience as a theorist of corporate strategy at Harvard Business School. His cluster view essentially complements the importance of corporate strategies to remain prepared to dismantle and reinvest/relocate individual components of the corporate value chain (headquarters, distribution, production, etc.) in clusters of competitive and productive enterprises, which are comprised of similar or related firm components. Porter’s main interest is in explaining the sources of enduring competitive advantages in a global economy. He argues that competitive advantage is strongly localized; it arises from clusters and is shaped by four determinants: (i) factor conditions (human capital, natural resources, infrastructure, etc.), (ii) demand conditions (sophisticated and demanding local customers), (iii) related and supporting

980

M. Trippl and E. M. Bergman

industries (presence of capable suppliers and competitive related industries), and (iv) firm strategy, structure, and competition. These four determinants of competitiveness are interrelated and form a self-reinforcing system, the so-called diamond. Among all elements of the diamond, rivalry among cluster actors is seen as the most important one because it has a stimulating effect on all others. Fierce rivalry promotes the development of highly specialized input factors, it upgrades domestic demand, and it spurs the rise of related and supporting industries (Porter 1998). Vigorous competition among locally based rival firms is, thus, the main driving force in Porter’s cluster model. However, he also states that clusters represent a combination of competition and cooperation, although the cooperative dimension of clusters is seen as less important compared to the competitive one. Porter’s view on cooperation is rather specific. His writings reflect a deep skepticism when it comes to horizontal collaboration, pointing to a set of harmful effects that potentially arise from cooperation among competitors. Vertical cooperation (i.e., collaboration and knowledge exchange with suppliers, customers, and actors from related and supporting industries) in contrast is seen to be beneficial. Porter’s work contains only some vague cues on the importance of social factors (more precisely, social capital) that facilitate interaction and collaboration between actors located in a cluster. Social (as well as cultural and institutional) features of clusters are clearly undertheorized in Porter’s approach. Other points of critique that have been raised (Martin and Sunley 2003) include – among other things – that Porter has failed to provide a clear specification of the geographical and industrial boundaries of clusters, related difficulties to identify clusters, and the uncritical reception of the approach in policy circles. The undeniable attraction of Porter’s cluster concept results from its twofold usefulness in providing answers to these questions: (i) What advice should a firm or corporate component consider in its corporate strategy to maintain maximum competitiveness when considering alternative business environments, and (ii) what advice should subnational regions consider when designing economic development policies aimed at providing attractive production sites and a sustainable employment base? Accordingly, many of the most ardent followers of this concept are business or economic development consultants and policy officials.

3

Similarities and Dissimilarities

It is already apparent from the brief outlines above that important differences but also similarities exist among the concepts considered in this chapter. To deal with these differences and overlaps more systematically and thereby reveal more about each, we will rely on a series of important features that are summarized in Table 1.

3.1

Geographies and Space

All concepts are based upon a specific understanding of space and certain geographic factors that help bound and define each in particular ways. The Marshallian district

Diffusion of ideas, localized spilling over of trade secrecies

Industries, technological intensity of products, and markets Innovation

Mix of competition (driving force of districts) and cooperation (precondition for collective innovation benefits)

“Industrial atmosphere”

Craft refinement and perpetuation, flexible organizations, incremental innovation Strong family and community values/identity, blurred boundaries between business and community spheres Complex mix of competition and cooperation/high importance of cooperation as coordination mechanism

All industries prevailing in the nineteenth century

Interactions (relationships between actors)

Environment (cultural and institutional context, extraeconomic factors) Competition and cooperation

Consumer discretionary/lowmedium craft

MAR

Agent motivation

“Mastery of craft” + place/ product pride; consolidate highvalue, luxury/income-elastic consumer markets MAR, long-term, trust-based local networks

Profit maximization

Geographies and space Core actors

Neo-Marshallian industrial districts Common political district occupancy Family/fluid firms, local officials

Marshallian industrial districts Geographical distance/ neighborhoods Small firms

Table 1 Key features of concepts

(Poorly specified) balance of competition and cooperation

Shared values and visions – Regional cohesion

Adoption of extrinsically produced innovations that can be regionally exploited

Innovative milieus Relational “space” coexisting with network Social and political institutions, knowledge producers Growth, place/product pride; gain greater prominence in tech-based development initiatives Informal social relations, collective learning within the milieu, formalized extra-local networks High-tech intensive

MAR externalities, pre-commercial joint ventures, knowledge sourcing Trust and social capital forms the “glue” that binds cluster actors together Fierce competition as key driving force/cooperation less important

MAR + ROI, seize market and production opportunities afforded by globalization MAR, supply/value chains, vertical forms of exchange, project alliances Export-intensive industries/high-tech and low-tech

Firms of all size + industry groups

Industrial clusters MAR + globalized factors

Clusters, Local Districts, and Innovative Milieux 981

982

M. Trippl and E. M. Bergman

was essentially limited by distances that could be economically traversed daily by workers and suppliers in the late nineteenth century, the densities of which were sufficiently high and distances sufficiently short that districts could often be referred to as “neighborhoods.” The late twentieth-century versions of neo-Marshallian industrial districts found in Italy are bounded instead within small governmental units known as political districts. The political boundaries are not matters of definitional convenience or regulatory requirement; rather, the local unit of government is an active partner and facilitator in the functioning of a neo-Marshallian industrial district. Innovative milieus are defined not at all by frictions of distance that separate the factors from individual producers or by official boundaries but instead by the “relational space” enclosed within a specific networked pattern of contacts among key institutions engaged in promulgating innovative impulses. Industrial clusters have perhaps the most flexible geographies, although they build upon Marshallian and MAR principles of market interaction that – with recent transportation and communication improvements – have expanded outward to include the “daily urban system” of entire metropolitan areas and regions. Indeed, Porter even speaks of a “California wine cluster,” although this is almost certainly comprised of several smaller, distinctive clusters of wine producers organized around a particular “terroir” that can take advantage of California’s full resources.

3.2

Actors and Interactions

Small- and medium-sized independent firms are the core agents in Marshallian districts. They are essentially profit-maximizing market actors who find greater market advantages by being located in industrial districts than if located apart. The prototypical neo-Marshallian industrial district found in the Third Italy consists of multiple agents: small family firms that expand or contract in various business arrangements with other firms, employees who may switch or hold jobs in multiple firms, and the local political district that establishes policies and practices. Profit maximization is but one of several motivations, which it shares with a mastery of craft, pride of place, and external recognition by consumers of luxury and incomeelastic products. Agents are more difficult to isolate in the case of an innovative milieu because its constitutive networks engage many different institutions in a structure where specific roles go unremarked, although such institutions are said to be motivated by pride in their region and their efforts to promote innovation. Individual market and political agents operating lower down, at the level of the local economy, are seen as part of a “territorial production system” that interacts with and depends upon the innovative milieu, but these generally go unacknowledged as its core agents. Industrial clusters are essentially populated by firms and firm branches of all size and forms of corporate organization as their principal agents. It is often the case that the cluster itself acquires agent status and takes on a loose organizational form of institutional governance to promote the cluster, but this varies widely with fewer cluster organizations in Anglo-Saxon and more in continental

Clusters, Local Districts, and Innovative Milieux

983

European and Asian clusters. Return on investment, productivity growth, and market shares are principal agent motivations. Looking at interactions, one finds that Marshallian districts are knit together by supplier linkages and often unconscious flows of knowledge, ideas, and workers (i.e., knowledge spillovers) among firms located nearby. The neo-Marshallian industrial districts theory puts due emphasis on long-term trust-based collaborative networks of small firms (and supporting organizations) that promote an easy exchange of tacit knowledge, joint purchase of materials, or joint initiatives to get access to technical or financial services. The literature on innovative milieus stresses the importance of mainly informal social relationships at the regional level which promote collective learning and, thus, the innovation capacity of companies. Collective learning is based on three mechanisms of knowledge transmission (see, for instance, Keeble 2000): (i) intra-regional mobility of labor, (ii) spin-offs, and (iii) formal and informal networks. It is noteworthy that the milieu literature specifically considers the role of extra-regional linkages. Such links to the outside world are more often than not formal in nature and considered as crucial in order to get access to complementary knowledge, markets, and technologies that are not available within the limited context of the milieu. The industrial cluster concept considers links of companies to demanding customers, suppliers, and vertical forms of exchange and interaction among companies. Both input-output relations and flows of knowledge are discussed in Porter’s conceptual cluster model.

3.3

Industries and Innovation

Marshall did not specify the full range of nineteenth century industries, products, or technologies expected in his industrial districts, but it seems clear that any industrial process under way in England during that period could benefit from being located in a district. In short, the market interactions in Marshallian districts apply to all industries. Contrast this to the neo-Marshallian industrial districts of the Third Italy, which are very highly focused on mature industries and a more limited range of goods, markets, and production technologies. Highquality personal or household goods, often fashion- and design-intensive products, are intended for international markets of discerning consumers. Such goods are often craft-based or limited in production, which requires highly skilled artisans and quality-conscious production that takes advantage of incremental improvements in basic technologies. In comparison, the very idea of innovative milieu hinges on the defining importance of high technology, whose reemergence in the 1980s was seen as necessary to revive old (Maillat et al. 1997) and propel new production centers. Apart from the stress placed on links between high technology and industrial innovation, there appear to be no specific industries or markets implied. Industrial clusters are considered relevant to any industry or product where competitive forces require producers to enjoy MAR or Porterian diamond advantages. These are most acutely felt by firms competing in

984

M. Trippl and E. M. Bergman

international markets, although locally competing firms are also said to benefit from cluster advantages. In his analysis of British industrial districts in the nineteenth century, Alfred Marshall is remarkably clear about the sources and types of innovation arising in highly localized industries. Innovation in Marshallian districts is seen to be essentially underpinned by a rapid diffusion of novel ideas and best practice solutions, spillovers of trade secrecies, and a large stock of industry-specific knowledge which is constantly further improved by combinations of ideas of colocated firms and intergenerational knowledge transfer. Interestingly, Marshall seems to employ a rather broad definition of innovation that encompasses not only new products and processes but also “inventions and improvements . . . in general organization of the business . . .” (Marshall 1920, p. 225), that is, organizational innovations. Innovation in the neo-Marshallian industrial districts of the Third Italy is mainly incremental in nature (Asheim 2000). Small firm size, specialization in traditional industries, and little investment in formal research and development are seen as the main reasons for modest levels of technological innovation activity and radical innovation. Firms in Italy’s neo-Marshallian industrial districts tend to rely more on improving product quality, upgrading, and incremental process innovation (Rabellotti et al. 2009). Innovation activities are highly collaborative in nature (see section “Core Concepts”) and typically oriented on craft refinement and perpetuation as well as on enhancing flexible organization structures. Protagonists of the innovative milieu concept stress both the importance of local collective learning processes (see section “Actors and Interactions”) and the inflow of extra-local competences and complementary knowledge about technologies and markets. Innovation in milieus is essentially – although not exclusively – about the regional adoption and exploitation of knowledge, technologies, and innovations generated elsewhere. Innovation in industrial clusters is the outcome of the working of MAR externalities and conscious actions undertaken by firms such as pre-commercial joint ventures and acquisition of knowledge from a variety of sources. Elements of the diamond are seen to have substantial innovation-enhancing effects. Demanding customers, for instance, force cluster firms to carry out permanent improvements and innovation. Knowledge exchange and joint development projects with supplier and related industries provide essential inputs for innovative activities, and strong local competitors perform a critically important role in generating a high pressure to innovate. Taken together, these factors and conditions are supposed to constitute a fertile ground for continuous improvements and more radical innovations of products, processes, and organizations.

3.4

Environment, Competition, and Cooperation

The concepts under consideration here share an emphasis of the role that the environment, or more precisely, the cultural and institutional context, can play in regional development, competitiveness, and innovation. Marshall employs the notion of “industrial atmosphere” to highlight the role of social and cultural factors

Clusters, Local Districts, and Innovative Milieux

985

in supporting localized knowledge flows and industrial development. Colocation of similar firms in the same community implies that in Marshallian industrial districts “the secrets of industry are in the air.” A strong local cultural identity and shared industrial expertise are seen to form essential institutional foundations of the evolution of Marshallian industrial districts. The literature on neo-Marshallian industrial districts clearly supports this view. Soft noneconomic factors such as a set of shared values, norms, trust, and collective identity are seen as fundamentally important to the economic success of industrial districts. The boundaries between the local community and industry are porous and often difficult to identify. Social and economic relations tend to be highly interwoven. Mutual trust, the identification with the region, and the products of a district, etc. are regarded as utmost significant. These factors enable and stabilize different types of collaboration and allow for a fruitful combination of cooperation and competition (see below). In a very similar vein, the milieu approach stresses the importance of the sociocultural dimension of innovation and development. A set of common values and shared visions (such as an orientation on long-term development goals instead of a short-term profit) and a willingness to collaborate reflect sound levels of regional cohesion and underpin high rates of innovation. It is important to note, however, that potential negative effects of too much cohesion on long-term innovation and adaption capacities of regions are also considered in the milieu literature. The industrial cluster concept proposed by Porter does not totally ignore noneconomic factors, but they are much less emphasized in his analysis when compared with the other approaches. He notes that trust and social capital are prerequisites for close interaction of cluster actors: “Social glue binds clusters together, contributing to the value creation process. Many of the competitive advantages of clusters depend on the free flow of information, the discovery of value-adding exchanges or transactions, the willingness to align agendas and to work across organizations, and strong motivation for improvement. Relationships, networks, and a sense of common interest undergird these circumstances. The social structure of clusters thus takes on central importance” (Porter 1998, p. 225). One finds strong differences between the key concepts under review in this chapter when it comes to determining the relative importance of competition and cooperation as coordination mechanisms and key source of competitiveness. Marshallian industrial districts rely on both competition and cooperation. In Marshall’s view, it is competition that is the key driving force of industrial districts, while district benefits in knowledge creation and innovation are a result of collaboration (Newland 2003). According to Marshall, cooperation can take two forms: it may be conscious and intentional (an example is the formation of industry associations) or unconscious and automatic (knowledge spillovers). In neo-Marshallian industrial districts, firms’ willingness to cooperate is of critical importance for innovation and competitive advantage. In Third Italy’s districts, “a complex mix of competition and cooperation” (Brusco 1990, p. 1) tends to prevail. “The efficient co-ordination of the district’s activities and the promotion of dynamic growth is not simply a product of the unfettered operation of classic competitive market principles; on the contrary, what is at work is a complex amalgam of both competitive and co-operative

986

M. Trippl and E. M. Bergman

principles . . . co-operation is at least as important as competition for organizing the district” (Sengenberger and Pyke 1992, p. 16). Protagonists of the concept of innovative milieu also emphasize the balance between competition and cooperation. The nature of this balance is, however, not clearly specified, and in most contributions to the milieu school, the cooperative dimension seems to be rated as more important than the competitive one (Newland 2003). The industrial cluster concept – at least Michael Porter’s version of it – does not emphasize the role of cooperation to the same extent. On the contrary, it is fierce competition and not cooperation that determines competitive advantage. Porter and Ketels (2009) even argue that a large number of cluster benefits occur simply due to collocation, that is, for most benefits to unfold, no conscious and active collaboration by cluster firms is required. This does not mean, however, that cooperation is completely unimportant in Porter’s analysis. He recognizes some advantages in vertical forms of exchange, while cooperation between competitors is seen as harmful. “Clusters clearly represent a combination of competition and cooperation. Vigorous competition occurs in winning customers and retaining them . . . Yet cooperation must occur in a variety of areas . . . Much of it is vertical, involves related industries and is with local institutions. Competition and cooperation can coexist because they occur on different dimensions and between different players; cooperation in some dimensions aids successful competition in others” (Porter 1998, p. 222). While cooperation is important, it does not have the same significance as competition. This argument is clearly supported by the fact that cooperation is not part of Porter’s diamond, while competition performs as the most crucial element in the diamond model because of the supposed stimulating effect it has on the other ones.

4

Conclusions

In this chapter, we have provided an overview on what is known about neo-Marshallian industrial districts, innovative milieus, and industrial clusters. From the 1970s onward, protagonists of these three concepts have essentially enhanced one’s understanding of the main sources of regional competitiveness and innovation. In the concluding section, we want to move beyond presenting past achievements made by adherents of the three concepts, glancing at the potential future of the notions under consideration. In our view, two aspects of future development deserve attention. The first concerns theoretical challenges; the second one is about the applicability of the concepts to other contexts and environments than those from which they have emerged. A key issue for future theoretical work on neo-Marshallian industrial districts, innovative milieus, and industrial clusters will be to provide a more dynamic view on spatially concentrated industries and their institutional underpinnings. Although efforts are under way to understand their evolution, the literature on clusters and

Clusters, Local Districts, and Innovative Milieux

987

neo-Marshallian industrial districts in particular have been criticized sharply for relying too heavily upon static analysis and saying little about the development of regional collections of firms and industries over extended time periods. Consequently, a proper conceptualization of the evolution and transformation of agglomerated industries is urgently needed and should be a core topic of future research. What is the potential future of the three concepts assessed by their transferability to other regions and countries than those from which they have emanated? Given the fact that the cluster notion provides a broader framework that allows for capturing very different forms of geographical concentrations, one might argue that it will have the edge over the neo-Marshallian industrial district and milieu concepts. We see the cluster concept as the most fruitful one for cities and regions in Europe, North America, and Oceania and continuing in the near future, but it is unclear whether it will apply as convincingly in the more distant future or to other rapidly emerging economies of Latin America, Asia, or perhaps Africa. Is it a foregone conclusion that clusters as we now know them will arise in clearly identifiable forms in East Siberia, Mumbai, Jakarta, Amazonas, or Szechuan by 2050? Or will local circumstances – as was the case for neo-Marshallian industrial districts, innovative milieus, or industrial clusters – produce a rich variety of stylized models of local economic development to meet new challenges? Emergent countries may eventually come to differ rather considerably from the so-called Western models that presently operate within a relatively narrow range of characteristic democratic and market institutions. As the world’s economies struggle to adjust to new monetary regimes, altered financial regulations, and impending resource frontiers, one cannot be certain how the systems of global production and international markets may change or how their knock-on effects may require the further repositioning of urban and regional economies.

References Arrow K (1962) The economic implications of learning by doing. Rev Econ Stud 29(3):155–173 Asheim B (2000) Industrial districts: the contributions of Marshall and beyond. In: Clark G, Feldman M, Gertler M (eds) The Oxford handbook of economic geography. Oxford University Press, Oxford, pp 413–431 Aydalot P (1988) Technological trajectories and regional innovation in Europe. In: Aydalot P, Keeble D (eds) High technology industry and innovative environments: the European experience. Routledge, London, pp 22–47 Aydalot P, Keeble D (1988) High-technology industry and innovative environments in Europe: an overview. In: Aydalot P, Keeble D (eds) High technology industry and innovative environments: the European experience. Routledge, London, pp 1–21 Becattini G (1979) Dal settore industrial al distretto industrial. Alcune considerazioni sull’ unità di indagine dell’economia industrial. Riv Econ Polit Ind 5(1):7–21. (Reprint of a new version in English: sectors and/or districts. Some remarks on the conceptual foundations of industrial development. In: Goodman E, Bamford J (eds) (1989) Small firms and industrial districts in Italy. Routledge, London, pp 123–135)

988

M. Trippl and E. M. Bergman

Becattini G (1990) The Marshallian industrial district as a socio-economic notion. In: Pyke F, Becattini G, Sengenberger W (eds) Industrial districts and inter-firm cooperation in Italy. International Institute for Labour Studies, Geneva, pp 37–51 Becattini G (2003) Industrial districts in the development of Tuscany. In: Becattini G, Bellandi M, DeiOttati G, Sforzi F (eds) From industrial districts to local development. An itinerary of research. Edward Elgar, Cheltenham, pp 11–28. (Reprint of Becattini G (1978) The development of light industry in Tuscany: an interpretation. Econ Notes 2(3):107–123) Benko G, Desbiens C (2004) French economic geography: introduction to the special issue. Econ Geogr 80(4):323–327 Brusco S (1990) The idea of the industrial district: its genesis. In: Pyke F, Becattini G, Sengenberger W (eds) Industrial districts and inter-firm cooperation in Italy. International Institute for Labour Studies, Geneva, pp 128–152 Camagni R (1991a) Introduction: from the local ‘milieu’ to innovation through cooperation networks. In: Camagni R (ed) Innovative networks: spatial perspectives. Belhaven Press, London, pp 1–9 Camagni R (1991b) Local ‘milieu’, uncertainty and innovation networks: towards a new dynamic theory of economic space. In: Camagni R (ed) Innovative networks: spatial perspectives. Belhaven Press, London, pp 121–144 Crevoisier O (2004) The innovative milieus approach: toward a territorialized understanding of the economy? Econ Geogr 80(4):367–379 Crevoisier O, Maillat D (1991) Milieu, industrial organization and territorial production system: towards a new theory of spatial development. In: Camagni R (ed) Innovation networks: spatial perspectives. Belhaven Press, London, pp 13–34 Cruz SCS, Teixeira AAC (2010) The evolution of the cluster literature: shedding light on the regional studies-regional science debate. Reg Stud 44(9):1263–1288 Glaeser EL, Kallal HD, Scheinkman JA, Schleifer A (1992) Growth in cities. J Polit Econ 35(6):1126–1152 Guerrieri P, Iammarino S, Pietrobelli C (eds) (2001) The global challenge to industrial districts. Edward Elgar, Cheltenham Iammarino S, McCann P (2006) The structure and evolution of industrial clusters: transactions, technology and knowledge spillovers. Res Policy 35(7):1018–1036 Keeble D (2000) Collective learning processes in European high-technology Milieux. In: Keeble D, Wilkinson F (eds) High-technology clusters, networking and collective learning in Europe. Ashgate, Aldershot, pp 199–229 Landström H (2005) Pioneers in entrepreneurship and small business research. Springer, New York Maillat D, Léchot G, Lecoq B, Pfister M (1997) Comparative analysis of the structural development of milieu: the watch industry in the Swiss and French Jura arc. In: Ratti R, Bramanti A, Gordon R (eds) The dynamics of innovative regions: the GREMI approach. Ashgate, Aldershot\Brookfield, pp 109–137 Marshall A (1920) Principles of economics, 8th edn. Macmillan, London Martin R, Sunley P (2003) Deconstructing clusters: chaotic concept or policy panacea? J Econ Geogr 3(1):5–35 Newland D (2003) Competition and cooperation in industrial clusters: the implications for public policy. Eur Plan Stud 11(5):521–532 Piore M, Sabel C (1984) The second industrial divide. Basic Books, New York Porter ME (1990) The competitive advantage of nations. Harvard Business Review Press, Cambridge Porter ME (1998) On competition. Harvard Business School Press, Boston Porter M, Ketels C (2009) Clusters and industrial districts: common roots, different perspectives. In: Becattini G, Bellandi M, De Propris L (eds) A handbook of industrial districts. Edward Elgar, Cheltenham, pp 172–183

Clusters, Local Districts, and Innovative Milieux

989

Rabellotti R, Carabelli A, Hirsch G (2009) Italian industrial districts on the move: where are they going? Eur Plan Stud 17(1):19–41 Romer P (1986) Increasing returns and long-run growth. J Polit Econ 94(5):1002–1037 Sengenberger W, Pyke F (1992) Industrial districts and local economic regeneration: research and policy issues. In: Pyke F, Sengenberger W (eds) Industrial districts and local economic regeneration. International Institute for Labour Studies, Geneva, pp 3–29 Simmie J (2005) Innovation and space: a critical review of the literature. Reg Stud 39(6):789–804 Storper M (1997) The regional world. Guilford, New York

Section VI Regional Policy

The Case for Regional Policy: Design and Evaluation Carlos R. Azzoni and Eduardo A. Haddad

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 The Case for Regional Policies: Economic Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 The Case for Regional Policies: Noneconomic Reasons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 What Sort of Regional Policy? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 How to Assess the Effectiveness of Regional Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

994 996 999 1001 1003 1005 1007 1007

Abstract

Is regional policy necessary? If so, under what circumstances? The first part of the chapter discusses the rationale behind the existence of (or the need for) regional policies in general. Cases in which excessive concentration or inequality hinders national economic growth are natural candidates for regional policies. If concentration and inequality favor national growth and competitiveness, regional interventions call for a different sort of argument, such as national unity or cohesion. Any regional policy has sectoral and national consequences, as well as any sectoral or national policy has regional consequences. Should public authorities engage in explicit regional policies (place-based) or in sectoral or social policies (people-based) with desirable regional implications? Finally, if regional policies are implemented, how to evaluate them? The chapter discusses these topics. Keywords

Regional inequality · Regional policy · Regional identity · Policy justification · Policy evaluation C. R. Azzoni (*) · E. A. Haddad Department of Economics, University of Sao Paulo, Sao Paulo, Brazil e-mail: [email protected]; [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_137

993

994

1

C. R. Azzoni and E. A. Haddad

Introduction

Human activities are not evenly dispersed over space. In fact, the world is not flat, but spiky. Satellites made it possible to observe illuminated areas, which is an indicator of human presence. The images by the US National Oceanic and Atmospheric Administration reveal that only a small portion of the territory shows the presence of some kind of human activity: in Brazil, around 8%, 10% in Nigeria, 13% in South Africa, and 31% in the USA. Since these are countries with large territories, the concentration of human activities is expected. However, even in smaller countries such as Uruguay (8.5%), similar situations are observed. The same pattern of concentration can be observed in terms of economic activity: 50% of Chilean GDP is produced within the metropolitan area of Santiago; the region around Athens holds a similar share of the Greek GDP; 55% of Brazilian GDP comes from its Southeastern region, which accounts for only 11% of the country’s area; 55.8% of Italian GDP comes from the northern part of the country. The resource basis (existence of mining material, soil fertility, weather and climate conditions, topographic profiles, etc.) is heterogeneous and tends to be more so in countries with large north-south spreads, such as Chile, Brazil, the USA, Canada, Sweden, Iceland, etc. This heterogeneity of the natural basis helps explaining the disparities but is certainly only part of the explanation. Another aspect is the concentration in large cities: typically, the first three or four cities in a country account for an impressive share of total population and of sophisticated activities. The urban dimension of geographical concentration is becoming more and more relevant, as the share of tertiary activities is increasing at a fast pace all over the world. Nowadays, such activities account to over 70% of GDP in most countries, including LDCs. Given that these are typically urban, regional and urban concentrations become two sides of the same coin. The question to be posed is whether one should be concerned with the fact that human activity, especially income, is concentrated in some parts of the territory, as all evidence indicates. Most studies deal with the regional dispersion of per capita income levels across regions within countries, which is one dimension of the disparities. Part of these differences could be ameliorated by differences in cost of living, resulting in lower inequality in purchasing power. The evidence shows that even after taking into considerations differences in cost of living levels, there still is disparity in the purchasing power of income. Besides income, other dimensions could be added, such as differences in access to opportunities (investments, careers, etc.) and public services (health, education, security, etc.), since they are clearly vectors of regional inequality. One important dimension of inequality is the difference in the absolute levels of living conditions across regions. Although the different levels of per capita income across Swedish regions could raise political discomfort, even the poorest regions in the country obtain a decent minimum level of public services and income. This is quite different in countries in which the richer part could be doing well, but the poorer regions have a concentration of persons well below the minimum acceptable living conditions. A case of discomfort in Sweden becomes a case of humanitarian

The Case for Regional Policy: Design and Evaluation

995

rescue in poor countries. Of course, the sense of urgency is very different in these cases. There are poor regions with small shares of national population, such as the Brazilian Amazon region, areas outside the main cities of the Gulf countries, or the Australian outback. Even if the living conditions in those areas are far below the national average, the political importance is lower, and the pressure for the design of regional policies could be negligible. On the other hand, if poor regions are densely populated, such as the southern part of Italy or Spain, or the Brazilian northeast, the perception of falling behind assumes a different dimension, expressed in votes and political support. The spread of human activity within countries resembles the same phenomenon between countries, and the same factors can be associated with both cases. However, countries are national entities, with the corresponding institutions and sovereignty. Regions within countries face similar problems but lack the tools to design and promote countervailing measures – at least not with the same strength as national governments. Thus, although any country could design development policies and use their full toolbox to implement them, regional authorities, when existing, typically lack access to the necessary tools. Thus, even when local regional governments intend to revert the negative situation of their regions, they lack the capacity to implement the necessary measures. In the face of important and long-lasting regional disparities, many national countries have decided to tackle the problem and promote regional deconcentration, as the well-known cases of the Mezzogiorno region in Italy, the Tennessee valley in the USA, the northeast region in Brazil, and the Andalucía region in Spain (see the chapter on ▶ “Regional Policies in East-Central Europe,” in this volume). Some questions arise, such as the efficacy of the policy instruments to reduce regional disparities. Looking at the present level of disparities in these countries, one would be inclined to assert that they have failed. However, given that the situation that would be present in the absence of such policies is unknown, it is not easy to assert whether they succeeded or not. This aspect is covered in the last section of this chapter. Another aspect is the choice of regional over sectoral or social policies. It is true that poor regions have a large concentration of poor people. However, the question arises if the region is poor because it has poor people or it has poor people because it is poor. The second case calls for policies targeting the region (place-based policy); the first indicates a case for policies directed to people (people-based policy). Sometimes sectoral policies will have regional impacts well above their sectoral objectives. One such case is the long-lasting and temporally consistent investments by the Brazilian ministry of agriculture in technological advancement and institutional building in the ranch and farming sector. This successful program, originally directed to solving the food supply problem of the country, ended up bringing the country to the spotlights as one of the largest exported of food products and transformed its mid-west, a previously sparsely populated and poor area. Other sectoral policies could have negative spillover effects in some regions, such as the national import-substitution industrialization policies undertaken in various LatinAmerican countries that reinforced the primate role of a few industrialized regions.

996

C. R. Azzoni and E. A. Haddad

These topics compose a complex scenario for decision-makers: whether disparities are inevitable and instrumental to national growth, and should be accepted as a by-product of national affluence goals, or if they should be addressed by policymakers, either because they compromise the achievement of national goals or because they are unacceptable on humanitarian and political grounds. This chapter deals with these issues, starting by discussing economic considerations related to the generation of disparities in the process of national economic growth and the possible trade-off between regional disparities and national efficiency and growth. Noneconomic reasons are discussed next, highlighting some interesting cases, such as the reunification of Germany and the building up of the European Union. Whatever the motivation for the discomfort with the presence of regional disparities in any country, any attempt to intervention has to be concerned with effectiveness, which brings into discussion the available alternatives to government action. This part of the chapter deals with the question of choosing among place-based, people-centered, or sectoral policies to tackle the problem. Finally, a discussion of ways of assessing the effectiveness of policies is presented in the last section of the chapter.

2

The Case for Regional Policies: Economic Considerations

Regional disparities, either represented by concentration of economic activity or inequality in personal welfare indicators across regions, per capita income being the most used, can be an issue for economic and/or political or humanitarian reasons. It might be the case that excessive concentration or inequality poses a restraint on national growth or capital accumulation. In this situation, reducing regional disparities is consentaneous with improving national growth. On the other hand, if regional concentration favors national efficiency and growth, any measure to reduce disparities might hinder national objectives. In the first case, support for regional policies is easily obtained. In the second case, some noneconomic argument is needed, since the reduction in disparities will come at a cost in terms of national objectives. Determining which is the case is not an easy task. In the short run, favoring a specific region with tax incentives, for example, could be harmful for national growth in the short term, but cumulative effects over a period of some years could provide a favorable symbiosis and result in a good choice. The original studies of the 1950s by Gunnar Myrdal and Albert Hirschman pointed to cumulative causation processes that would increase personal and regional inequality, since the process of national growth is associated with backwash and polarization effects. These ideas were presented with a solid argumentation but lacked model formalization or empirical evidence, as any study nowadays would be required to show. This was later revisited in the 1970s by Nicholas Kaldor in a lecture in which he presented, in verbis, the economic mechanisms behind the backward situation of Scotland within the UK. His ideas were put in a proper model by Robert Dixon and Anthony Thirlwall in 1975. This cumulative causation point of view received important theoretical contributions at the turn of the twentieth century, within the new economic geography school

The Case for Regional Policy: Design and Evaluation

997

of thought, especially with the works of Paul Krugman, Masahiro Fujita, and Anthony Venables. These new studies have given rigorous formalization of the ideas and concepts of this line of thought. Within this general equilibrium theoretical framework, economic agents are affected by centripetal and centrifugal influences. By choosing a location in a peripheral area, firms face lower land and labor costs but might incur in higher transportation costs related to delivering the product to the core regions. If they opt for the core region, they face higher levels of costs and congestion costs, but they will benefit from agglomeration economies. These include more productive workers, as the economic and social milieu at the core region also attracts skilled workers and is, therefore, more conducive to efficiency and growth. The balance of these agglomeration forces, plus transportations costs, vis-à-vis lower costs outside the core determines their choices and conditions their performance. As the evidence indicates, centripetal forces tend to surpass centrifugal influences, leading to the concentration of economic activity in the core. Since the balance of forces favoring the core area is strong, any attempt to counteract them will face huge difficulties in achieving a more balanced distribution of economic activity in the countries. (Brakman et al. (2005) provide a thorough discussion applied to the European Union.) Within the neoclassical type of growth models, the operation of the market mechanisms would take care of disparities. As poor regions present lower wage levels, labor will move to richer regions in search for higher remunerations, thus reducing population in the poorer areas. On the other hand, as capital is scarce in these parts of the country, its return will be higher there, thus attracting investments. A combination of these factors – exit of labor and inflow of capital – would increase per capita income in the poorer regions. In such a context, regional disparities would vanish if the market mechanisms were allowed freedom to exert their influence, meaning no room for government intervention. The dispute of these two opposite ways of looking at the formation of regional disparities in the process of national growth naturally attracted the attention of many researchers. The debate had to be solved based on empirical evidence. In the 1950s, the use of statistics was becoming more popular, and the seminal cross-country study by Simon Kuznets on the influence of personal income inequality on national growth became a benchmark. His conclusion, latter supported by other empirical studies, was that personal income inequality increases at the initial stages of development but decreases at higher levels of income. Thus, national growth would solve the problem of personal inequality in the long run but would produce more inequality in the intermediate stages of per capita income. Naturally, this idea was transposed to the regional dimension, in an empirical study developed in 1965 by Geoffrey Williamson, who arrived at the same famous inverted U-shaped curve about regional inequality, which is considered a by-product of growth. Once countries surpass a certain per capita income level, disparities would decrease. The policy orientation derived from this interpretation is clear: countries should concentrate on promoting national growth to reduce regional disparities, although some increase would be expected in the process.

998

C. R. Azzoni and E. A. Haddad

The empirical evidence on these aspects is not conclusive, given the variety of cases and methods applied. There are results indicating that spatial concentration induces national growth and vice versa and others suggesting that concentration and growth are mutually reinforcing, i.e., causality is seen in both directions. The same reasoning could be applied to urban concentration within countries, another form of spatial disparity, and, again, the empirical evidence of cross-country studies are not conclusive, since positive association between the share of the population living in the main urban concentration in the country and national growth was found in general, but with less strength at higher levels or urban concentration, and even becomes negative. That is, concentration in large cities is good for growth, but excessive concentration might be negative, reinforcing the inverted U-shaped relationship between disparity and growth presented before in this chapter. Looking at the problem by its inverse, the idea of convergence among countries and regions became fashionable in the 1990s. If lagging regions grew at higher rates in comparison to richer regions, then inequality would be reduced as national growth occurs. There are two ways of looking at the resulting level of inequality. A more restrict situation is when all regions converge to the same per capita income level (absolute convergence), and a less restrict situation is when each region converges to their own steady-state level of income (conditional convergence). In the first case, inequality vanishes; in the second case, there will be an equilibrium level of inequality. In other words, even in a steady state situation, the per capita income levels will not be the same across regions. The empirical evidence shows that absolute convergence is seldom observed, but conditional convergence is the typical result found in almost all studies, with an amazing uniformity of the estimated rate of convergence around 2% per year. Thus, the conclusion is that growth will diminish inequality, but the end situation will still present a certain degree of regional inequality. And the faster the estimated rate of convergence, the closer to the equilibrium rate of inequality is the present situation. Other ways of looking at the effects of concentration on national growth is through the evolution of regional productivity. Considering that regions compete with each other to serve national (and international) demand for goods and services, changing the observed status of regional inequality requires that regions with lower shares in GDP exhibit higher productivity levels and faster productivity growth. In this way, they become more competitive to fight for their share in demand. This kind of study is less frequent, since it requires data that is scarce at the regional level. If the estimated productivity levels replicate the traditional inequality levels observed in the countries, there is less hope for reducing inequalities in the future. A major part of economic activity is developed in cities. This shows up if we consider the share of population living in cities or the share of GDP coming from them, associated to the growing quantitative and qualitative importance of tertiary activities. Within the system of cities of many countries, the largest cities are more competitive, since they attract the best firms and the best workers, as a vast literature on the existence of a wage premium in large cities indicate. The premium is associated to higher productivity of their firms and is higher for workers with a college degree, and it grows with the city size. Large cities attract skilled residents

The Case for Regional Policy: Design and Evaluation

999

and more productive firms and tend to become more competitive over time. Given the increasing importance of urban activities in the economy, this is another form of suggesting that more agglomeration increases national productivity. The empirical ground in this area is paved with different studies, some showing that agglomeration effects on growth are quantitatively important and pervasive, with differences between developed and undeveloped countries and countries with large urban populations, while in other countries, cities with population under three million are more conducive to growth. Considering the surveyed evidence, one could find reasons for supporting regional policies from a pure efficiency standpoint in some cases, but the conclusion should not be applied to all cases. However, the evidence could also be used in favor of the idea that ways of impeding concentration would impair national efficiency and/or growth, since most results are disputable on methodological grounds. Overall, the literature is not conclusive about the role of regional disparities in the national economy, leaving the decision on whether or not to embark in regional policies to noneconomic factors.

3

The Case for Regional Policies: Noneconomic Reasons

Regardless of the aspects previously discussed about the reasons for intervention at the sub-national level, the fact is that regional policies are to be found in almost every country, under different formats and extensions. It helps comparing countries and regions within countries with respect to this point. Typically, the right of countries to implement economic growth and development policies is recognized as a noble objective and even constitutes an obligation for any government. One could base the decision to embark in such kind of national polices just by looking at the inner conditions of the economy, considering the lack of employment generation, lower capital formation, etc., or could consider comparisons with other economies of the same kind, geographical area, ethnic origins, etc. Even if national growth and development policies could be criticized about their necessity, design, and implementation, the idea of countries fighting for improving their living conditions is taken for granted in international forums. The application of these values to regions within countries is not automatic, however. In the first place, the toolbox of the regional authorities is limited, in comparison to those of the national government. Regions are more open to trade than nations, and this poses limitations to interregional trade control that do not apply to international trade. The regional barriers are evasive, and the utilization of the same currency makes control difficult. And, most importantly, the national economy is considered as a system of competition between its regions, meaning that progress in one part might be understood as undermining the chances of success in other parts of the territory. Therefore, obtaining consensus for protecting particular regions is a difficult political process, adding substantive transaction costs to the process (the chapter by Hartwell in this volume discusses ▶ “Place-Based Policies: Principles and Developing Country Applications”).

1000

C. R. Azzoni and E. A. Haddad

This regional fight for shares tends to lead to inertia, unless some real challenge shows up that organize the political system to support a particular regional policy. Cases of threat of separatism are in this realm, as the recent well-known cases of the north of Italy or Cataluña, in Spain. The fear of losing control of the Amazon area led the Brazilian government to implement a free export zone in Manaus, in the middle of the jungle. Developing strategic areas near conflictuous frontiers are also the case. An excellent example of political support for regional policy was the case of the reintegration of East Germany in the 1990s. There was an enormous gap in living conditions, productivity, etc. between the two areas, which returned to be parts of the same country. The support for a massive reintegration program was immediate, and many mechanisms were implemented, and huge amounts of resources were applied to bring the country, now a region as all others in Germany, to an acceptable level of conditions. The total amount invested was enormous, and an additional tax, the Solidaritaetszuschlag or solidarity surcharge, had to be introduced to help financing the expenditures. It was initially 7.5% and, since 1998, 5.5% of the tax amount to be paid from income, capital gains, and corporate tax. Nonetheless, given the historical situation and the political moment, the adherence to such a massive program was fast and consensual. The European Union is probably the most relevant case of regional integration. Although economic reasons were fundamental in the motivation for the creation of a unified Europe, including the expansion of the market and the generation of scale economies, the idea of cohesion prevails. As stated in the Union’s documents: “As an expression of solidarity between the EU Member States and their regions, economic and social cohesion aims to achieve balanced socio-economic development throughout the EU.” The balanced socioeconomic development applies not only across countries but also for regions within countries. Cohesion in the European case must be considered in respect to the rich history of conflicts among countries, cases of creations of federations of countries, with posterior fragmentation, etc. In any case, although growth, efficiency, and competitiveness at a broader scale are the main goals, paying attention to “economic and social cohesion” is taken as a condition to achieve those higher goals (for a thorough assessment of the lessons taken from the European Union, see the chapter by Cuadrado-Roura in this volume). Regional identities cannot be ignored by the higher goals of integration, either at a multinational scale, as in the EU, or within any country, small or large. They are strong, for many historical reasons, and they show up in any form of integration. Interesting studies show that Eastern Germans behave differently than West Germans even today in terms of deciding on financial investments and voting preferences. The political polarization manifests itself in the regional distribution of votes, which have presented a marked regional dualism recently in Europe, in the clear polarization in the 2018 Brazilian national election between poor and rich regions, the stalemate around the Brexit referendum, and in many other cases. Volume and density play important roles in surfacing the relevance of regional disparities and transforming them into a relevant political issue. Lagged regions with small fractions of national population might not become a “regional problem,” regardless of the level of inequality. On the other hand, heavily populated lagged

The Case for Regional Policy: Design and Evaluation

1001

areas do present themselves as unequivocal issues to be considered in any form of policy. This comes out both by the size of the population affected, such as having a large number of poor people being too evident to ignore, and, of course, by the voting power of the areas, wherever democracy is in action. As an example, the Brazilian northeast region, with 27% of national population, is more relevant of a problem as a lagged region, for political purposes, than the north region (Amazon), with only 8.7% of population, scattered over a vast extension of territory. At this point, it is relevant to distinguish two situations regarding the absolute level of economic welfare of citizens, considering income, health, and security and all forms of quality of life. In rich countries, even if the gap of living conditions between the top and bottom regions is wide, the absolute level in the latter can still be quite acceptable. This is totally different in low-income countries. There, on top of the gap, the absolute levels are minimum, even in the top regions. This situation leads the case to a different level, approaching humanitarian aspects. The flow of emigrants from Eastern and Southern Europe, the recent hordes of refugees from Africa to Europe, and the noisy case of the wall dividing the USA and Mexico (and, for that matter, all Latin America) to stop a continuous flow of people trying a better life in America are candid examples of the magnitude of the consequences of inequality between areas.

4

What Sort of Regional Policy?

As the studies mentioned above indicate, regardless of the conclusions of academic studies, governments do embark in regional policy, reverberating the political claims coming from the lagged regions, especially those with large voting power. The arsenal available to design and implement such forms of intervention is vast and varied, and almost every weapon has been used in some situation. Before dealing with the explicit forms of spatial intervention, it is worth discussing the fact that any policy has spatial consequences. As the economic activity has to take place at some point in space, any form of government intervention will necessarily have a spatial bias. On the same token, any nonspatial policy will have spatial consequences: export promotion policies will benefit the regions producing the favored product; import substitution measures will hurt the previous producers of the affected commodity; tax policies will influence different sectors with different intensity; and, unless activities are evenly distributed in space, the final result will not be spaceneutral. The case of Brexit is an excellent example of unintended regional consequences of a relevant intervention, since the impacts will be different across countries within the UK. Programs of cash transference to poor people implemented in different countries, such as Brazil and Mexico, do have spatial effects, as the available studies indicate. The discussion on the unintended spatial effects of macroeconomic or sectoral policies is vast. Changes in spatial-blind sectoral policies may affect the spatial concentration of economic activities and population, depending on the configuration of factor markets, on how factors adjust to changes in sectoral policies, and the

1002

C. R. Azzoni and E. A. Haddad

degree of long-run sectoral and locational specificity of production factors. Such is the case of sectoral activities such as oil and gas extraction in the USA, mining in Chile, Brazil, and Australia, and investments in infrastructure in the European context. Exchange rate policies defined at the national level can be quite influential in regional terms for exporting and importing sectors. The successful agricultural programs of the Brazilian government ended up reshaping the economic importance of the Brazilian mid-west, which increased its participation in the national GDP by fourfold in 50 years (see the specific chapter on ▶ “Rural Development Policies at Stake: Structural Changes and Target Evolutions During the Last 50 Years” by Torre and Walletin this volume). By its overarching nature, environment policy will necessarily produce different impacts in different regions, given the distribution of natural resources and population in space. Even social policy will have regional consequences: the already mentioned program of cash transfers to poor people implemented by the Brazilian government as a pure social policy ended up producing regional effects that are more intense than any other massive regional policy implemented in the country. There is a vivid debate on whether to invest in people, ignoring their location (people-based or spatially blind policies), or in territorial unities (place-based or territorial policies), as illustrated by the chapters of Martins and Garcilazo and Venables and Duranton, in this volume. The first approach is keen to the World Bank, as expressed in their 2009 World Development Report (World Bank 2009). The second point of view is the basis for the European cohesion programs, which focus on place-based interventions. Partridge et al. (2015) offer a discussion of the rationale behind both points of view. At the end of the day, it is all a matter of effectiveness. If place-based policies, considering all possible distortions it might impose to the functioning of the economy, do succeed in solving the problem it was designed to attack, there is little ground to argue against it. The same goes for peoplecentered policies. This sets the stage for the last section of the chapter, which deals with assessing the effectiveness of regional programs. Spatially explicit policies must deal with the fact that lagging regions present some form of weakness that hinders their competitiveness. Therefore, efforts are made to tackle those weaknesses and or to find competitive advantages of some sort that could be promoted. Export promotion is the most common modality of policy, be it by increasing productivity of poor regions or by giving them some edge in terms of special treatment. One alternative is the establishment of special economic zones, that are areas clearly demarcated within countries, with favorable rules for economic and business activities, involving investment conditions, customs, taxation, and regulation. They could be zones of free trade, free ports, export-processing zones, etc. Free export zones were fashionable some years ago, and are still in action in many countries, and are good examples of this. There are also cases of free import zones, such as the Brazilian Free Zone of Manaus, in the middle of the Amazon region, connected to the rest of the country only by water or air. It has attracted plants from the major world economic groups in electronics, motorbikes, pharmaceuticals, etc. Other forms include tax abatements, credit facility, and other possible advantages to companies locating in the preferred areas.

The Case for Regional Policy: Design and Evaluation

1003

Investing in infrastructure is another form of implementing regional policies, since, as part of their backwardness, infrastructure is typically lacking in poor regions. Diverting the flows to benefit these regions could be a way of given them a special treatment. In the European Union, this line of action was massively utilized, both for increasing connectivity but also to facilitate access to customers and suppliers. Other less direct forms of intervention are pervasive, such as helping the formation of local productive arrangements and/or strengthening the existing ones, other forms of export promotion or import substitution, and developing and strengthening local institutions. The most massive experiment in regional policy in recent years is the European Union, for which regional policy is “an expression of the EU’s solidarity with less developed countries and regions, concentrating funds on the areas and sectors where they can make the most difference.” It is basically an investment policy, geared to job creation, competitiveness, economic growth, improved quality of life, and sustainable development. The justification involves both economic and noneconomic reasons, for its aim is “to reduce the significant economic, social and territorial disparities that still exist between Europe’s regions” because they “would undermine some of the cornerstones of the EU, including its large single market and its currency, the euro.” Funds are directed toward boosting small- and medium-sized businesses, research and innovation, environment, access to digital technology, development of new products and production methods, energy efficiency and climate change, education and skills, and improving transport links to remote regions.

5

How to Assess the Effectiveness of Regional Policies

It is hard to assess if the impressive number of plants attracted to the Manaus special zone in Brazil could be tagged as a policy success. If in the absence of incentives those plants would have chosen a different location in the country, there is just a diversion of resources, which involved a cost in foregone tax revenue, for example. These have to be weighed against the benefits of developing the region, and these could be noneconomic as well, involving dimensions such as geopolitical, environmental, humanitarian, etc. Comparing the pros and cons of such endeavors is not an easy task, given the different dimensions involved. The plants could also have chosen another country to locate. In this case, the justification would have to be organized in terms of the national interests. Even after concluding that a particular intervention failed, a question remains about the quality of the implementation process. Even if, in theory, a determined strategy could be positive, a poor performance in the way it was put in practice might have jeopardized the whole effort. It is a matter of having the right institutions and with good quality, which is also unequally distributed over space. The same program can produce different results in different environments, and thus it is a stretch to judge a particular strategy as a failure from its result in a particular case. This adds another layer of difficulty in the assessment of regional policies.

1004

C. R. Azzoni and E. A. Haddad

The general approach to ex post policy assessment is based on causal inference. The road map for empirical studies requires the definition of the population of interest, the treatment variable, the outcome variable, and a moment of the distribution of the outcome variable. The empirical challenge is to identify the causal effect of some (regional policy) shock, measured in terms of the treatment variable, on the outcome variable. The usual setting found in most regional policy assessment studies faces the issue of external validity, i.e., they are not directly transferable to other locations. The causal relationships embedded in the analytical and numerical structures of the models’ estimates are valid only in the context of the evaluation of the impacts of the implemented policy interventions in the environments in which they were implemented. This is the issue of internal validity, which allows identifying the impact of an intervention conducted in a given environment. However, if one wants to forecast comparable impacts of similar policy interventions implemented in other spatial environments, one should use similar modeling structures estimated, nonetheless, with data for that specific region. In a sense, regional policy assessments still fail to deliver robust results that allow policy-makers to directly learn from other countries’ and regions’ experiences without any caveat on their specific contexts. There is a wide range of tools available for assessing the effectiveness of regional policies. Evaluation methodologies based on statistical techniques used in econometrics and quantitative research, usually associated with estimation procedures of reduced-form equations, that have been recently in vogue include instrumental variables estimation (IV), difference in differences (Dif-in-Dif), regression discontinuity design (RDD), and synthetic control methods (SCM). Instrumental variable estimations have been used to reveal causal effects in different contexts in which it is important to isolate sources of exogenous shocks in local socioeconomic outputs, addressing problems of endogeneity and reverse causation. An interesting example is Ganau and Rodríguez-Pose (2019), in their attempt to determine if the quality of regional institutions affects labor productivity in a set of firms from European countries. They run a regression of firm productivity on the quality of regional institutions. However, regions could have better institutions because of the good quality of their firms. That being the case, the observation of a positive effect of the quality of institutions on productivity would be misleading. Moreover, productive firms could be attracted to areas with better institutions, in a process of spatial sorting. Again, the coefficient of the quality of institutions would be biased. They solve the puzzle by using the generalized method of moments (GMM), combining internally generated instruments with external instrumental variables. For these, they use historical variables such as if the region belonged to the Roman or Charlemagne’s Empire, if it was Christianized in 600 AD, and the number of times it changed kingdom between 500 AD and 1000 AD. These variables reveal the environment in the regions well before the period analyzed and, therefore, could not be related to productivity in the period analyzed. They find a positive effect of the quality of institutions on productivity, especially so for smaller, less-capitalized, and high-tech firms.

The Case for Regional Policy: Design and Evaluation

1005

One interesting example of the use of Geographical Discontinuity is Giua (2017), in her analysis of the impact of regional policy in Italian regions. She uses spatial boundaries to differentiate municipalities belonging to objective 1 and non-objective 1 regions within the country. Considering that they are in the same neighborhood, it is probable that all observable characteristics are distributed in a well-behaved manner, except for the discontinuous jump between being under the objective 1 or not. This allows for separating the effects of the policy from all other sorts of influence, which are controlled for by the observable characteristics. She finds a positive effect of the policy on employment in the targeted regions and that it does not negatively affect the richer regions of the country. Different studies based on econometric approaches involving the use of regression discontinuity design techniques also provide evidence of regional policy effects on different outcome variables. Gagliardi and Percoco (2017) assess the impact of regional policy on NUTS 3 European regions. The idea is to deal with “accidental winners,” that is, regions that would not be eligible for the policy at the NUTS 3 level, since they were above the minimum income threshold but were eligible because they belong to an eligible region, defined at the NUTS 3 level. The study compares NUTS 3 regions just below and above the 75% threshold, thus being similar in respect to observed and unobserved characteristics, but which were treated differently by the policy eligibility rules. The study finds that the funds received had a positive impact on the economic growth of lagging areas and the effect was mainly due to the performance of rural areas in the neighborhood of large urban agglomerations. Appropriate tools are also available to assess the ex ante impacts of regional policies. Input-output and CGE models have been widely used for the analysis of regional policies. Despite their usefulness, they suffer from limitations, especially their limited ability to handle dynamics. Hence, they should be viewed as complement to existing models rather than as replacement. In this sense, modeling integration becomes a major goal to pursue. Some of the shortcomings of such models might be suppressed by inserting, for instance, a core CGE in a broader modeling framework. Isard’s vision of integrated modeling provided a road map for the development of more sophisticated analysis of spatial economics systems. Given their many virtues, if adequately addressed, CGE models are the main candidates for the core subsystem in a fully integrated system. As such, experience with simulation exercises allows forecasting the potential impacts of policy scenarios and interventions never historically experienced, which allow comparing non-policy to policyapplied scenarios.

6

Conclusions

This chapter dealt with issues related to the need of regional interventions. A general picture of existing regional disparities across countries was presented, involving dimensions such as income levels, human density, and living conditions. These disparities have prompted governments to produce solutions via different sorts of interventions, of which allowing easier and cheaper financing and applying tax

1006

C. R. Azzoni and E. A. Haddad

abatements are the most traditional forms. But other forms have been utilized in different contexts, such as investments in infrastructure in poor regions; creation of zones in which economic agents receive some sort of favored treatment; design of programs to boost exports or to increase competitiveness of local producers in relation to imported products; promotion of local productive arrangements; programs to boost the innovative capacity within regions; smart city mechanisms; etc. Considerations on the economic reasons for intervention were presented, highlighting the possible trade-off between efficiency at the national level and regional disparities, involving both regions and urban areas. If the existence of regional disparities and the spatial concentration of economic activities are harmful to national growth or national competitiveness, designing and implementing regional interventions would have a double-dividend, tackling both problems at the same time. However, if concentration and inequality help the economic performance of the country as a whole, support for regional interventions must come from different sorts of reasoning. The political economy behind regional policies is complex and involves both economic and noneconomic arguments, as well as different sort of interests of agents located in distinct parts of the territory of a country. The well-known cases of the Germany reunification or the formation of the European Union highlight the complexity of the motivations for regional interventions, exemplifying the important role of political or humanitarian reasons. Given the observation that most countries have some sort of regional policy, the question arises about the effectiveness of such mechanisms. Of the many existing experiences, the evidence is not conclusive about their results. Judging by the existing levels of concentration and disparities in countries that implemented regional programs, one would be forced to conclude that the results are, at most, meager. The flaw might come from a disoriented theoretical basis or from problems in the implementation process. In the first case, it could be that the need for the intervention was not there, or the key aspects were not clear from a theoretical point of view. As for the implementation, the problems could be more complex than the theoretical view would identify, and/or the mechanisms designed were not appropriate for the task. Considerations about the quality of human capital and institutions (social capital) at the regional level are important in the design and implementation of regional policies. Observing cases of failure or success can help little in judging the effectiveness of the interventions. Basically, observing the post-facto result will not inform about the situation that would have prevailed if the programs were not implemented. The chapter presented a discussion of ways of assessing the impacts of regional policies, with methodological alternatives do deal with this problem, such as techniques using instrumental variables, difference-in-difference (Dif-in-Dif), regression discontinuity design, synthetic control methods, and computable general equilibrium models. Regions do have an identity, for historical and cultural reasons, and present specific conditions that intervene in their wish to improve the wellbeing of their inhabitants, irrespectively of what is happening in other parts of the nation. Their goals and values may or may not be compatible with the national goals of the countries they belong to. As such, there might be a conflict between the national

The Case for Regional Policy: Design and Evaluation

1007

goal of economic growth and efficiency and the accommodation of regional interests linked to economic, social, and political disparities. The political system, which incorporates all these different interests, has to deal with this sort of conflict and to produce attempts to solve it. Regional conflicts of interests are present in most countries, rich and poor, as the cases of Brexit, north of Italy, and Cataluña, in Spain, illustrate. But it is particularly important in cases in which the levels of wellbeing in the lagging regions are below acceptable levels by humanitarian considerations.

7

Cross-References

▶ European Regional Policy: What Can Be Learned ▶ New Trends in Regional Policy: Place-based Component and Structural Policies ▶ Place-Based Policies: Principles and Developing Country Applications ▶ Regional Policies in East-Central Europe ▶ Rural Development Policies at Stake: Structural Changes and Target Evolutions During the Last 50 Years ▶ Special Economic Zones and the Political Economy of Place-Based Policies

References Brakman S, Garretsen H, Gorter J, Horst A, Schramm M (2005) New economic geography, empirics, and regional policy. CPB Netherlands Bureau for economic policy analysis, No 56, 27th May 2005, ISBN 90-5833-281-7. http://www2.ef.jcu.cz/~klufova/spatial_economy/ NEG_1.pdf Gagliardi L, Percoco M (2017) The impact of European Cohesion Policy in urban and rural regions. Reg Stud 51(6):857–868. https://doi.org/10.1080/00343404.2016.1179384 Ganau R, Rodríguez-Pose A (2019) Do high-quality local institutions shape labour productivity in Western European manufacturing firms? Pap Reg Sci 98(4):1633–1666 Giua M (2017) Spatial discontinuity for the impact assessment of the EU regional policy: the case of Italian objective 1 regions. J Reg Sci 57(1):109–131 Partridge MD, Rickman DS, Olfert MR, Tan Y (2015) When spatial equilibrium fails: is place-based policy second best? Reg Stud 49(8):1303–1325. https://doi.org/10.1080/00343404.2013.837999 World Bank (2009) World development report 2009: reshaping economic geography. World Bank, Washington, DC

Further Readings Aroca P, Azzoni C, Sarrias M (2018) Regional concentration and national economic growth in Brazil and Chile. Lett Spat Resour Sci 11(3):343–359 Baer W (ed) (2012) The regional impact of national policies: the case of Brazil. Edward Elgar, Cheltenham, p 254 Capello R, Camagni R, Chizzolini B, Fratesi U (2008) Modeling regional scenarios for the enlarged Europe: European competitiveness and global strategies, Advances in spatial science. Springer, Berlin

1008

C. R. Azzoni and E. A. Haddad

Isard W, Azis IJ, Drennan MP, Miller RE, Saltzman S, Thorbecke E (1998) Methods of interregional and regional analysis. Ashgate, Aldershot Mazinotto B (2012) The growth effects of EU cohesion policy: a meta-analysis. Bruegel working paper 2012/2014. http://bruegel.org/wp-content/uploads/imported/publications/WP_2012_14_ cohesion__2_.pdf Phinnemore D (2017) The European union: establishment and development. In: Cini M, PérezSolórzano Borragá N (eds) European union politics, 5th edn. Oxford University Press. Sept 2017. https://doi.org/10.1093/hepl/9780198708933.003.0002 Silveira-Neto R, Azzoni C (2012) Social policy as regional policy: market and nonmarket factors determining regional inequality. J Reg Sci 52(3):422–450 Sonis M, Hewings GJD (2009) Tool kits in regional science: theory, models, and estimation. Advances in spatial science. Springer, Berlin

Place-Based Policies: Principles and Developing Country Applications Gilles Duranton and Anthony J. Venables

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Place-Based Policies: The Spatial Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Proximity, Connectivity, and Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Complementarities and Coordination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Adjustment Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Spatial Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Place-Based Policies: Quantity Effects and Valuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Quantity Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Valuation: Market Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Policy Targeting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Place-Based Policies: Zones, Corridors, and Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Special Economic Zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Corridors and Long-Distance Transport Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Lagging Regions and Big-Push Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Local Economic Development Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Concluding Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1010 1011 1011 1012 1012 1013 1013 1013 1015 1016 1017 1017 1019 1022 1025 1028 1029 1029

G. Duranton The Wharton School, University of Pennsylvania, Philadelphia, PA, USA e-mail: [email protected] A. J. Venables (*) Department of Economics, University of Oxford, Oxford, UK e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_142

1009

1010

G. Duranton and A. J. Venables

Abstract

Many development policies, such as placement of infrastructure or local economic development schemes, are “place-based.” Such policies are generally intended to stimulate private sector investment and economic growth in the treated place, and as such they are difficult to appraise and evaluate. This chapter sets the rationale for place-based policies and a framework for analyzing their effects and assessing their social value. It then reviews the literature on placebased policies in the contexts of special economic zones, transport policy, lagging regions, and local economic development policies. Keywords

Place-based policies · Spatial · Economic corridors · Lagging regions

1

Introduction

Place-based policies are an integral part of development policy, driven by both equity and efficiency concerns. Equity, because the process of economic development is spatially uneven. Inevitably, some places will develop before others, this creating the likelihood of a Kuznets curve, with spatial inequalities increasing before diminishing during the course of development (World Bank 2009). Many of the public investments that are needed for development – transport and other infrastructure investments – are “lumpy” so have to be concentrated in particular places; some places will be served with roads, telecommunications, power, and other public services, before others. These regional disparities are likely to show up, perhaps dangerously, in regional and national politics. Efficiency, because a spatial development process involves market failures. There are numerous externalities, some positive arising in cluster formation, others negative arising from congestion in cities and from poverty in lagging regions. Fundamentally, private investment tends to cluster, as each investment decision is shaped by investments made by other private sector decision-takers, the location decisions of workers, customers, and supplier firms. The market mechanism does not do a good job coordinating these spatial decisions, creating the possibility that some places can become trapped in a low-level equilibrium. These market failures create an efficiency argument for policy intervention but also mean that the effects of policy are difficult to predict. Place-based policies (PBPs) include infrastructure investments, special economic zones, treatment of lagging regions, and local economic development policies. From the economic standpoint (the focus of this chapter), place-based policies have several distinctive features. The first is the spatial context; in Sect. 2 of the chapter, we outline the particular inefficiencies that arise in this spatial context. The second is that PBP are likely to be motivated by achieving “indirect,” as well as direct effects. For example, a transport improvement will have the direct effect of saving time and vehicle operating costs and may also have the indirect effect of inducing private sector investment, job creation, and higher productivity. Indirect effects depend on

Place-Based Policies: Principles and Developing Country Applications

1011

responses of the private sector, which are hard to predict and hard to value. These issues are addressed in Sect. 3, before turning to specific examples of policy – special economic zones, transport infrastructure, lagging regions, and local economic development policy – in Sect. 4.

2

Place-Based Policies: The Spatial Context

Several features of the spatial context are critical for understanding the role and effect of PBP. These are the role of proximity and connectivity in creating clusters of economic activity, the existence of complementarities between actions taken by different economic agents and the failure of the market mechanism to coordinate these decisions, and the process of economic adjustment to place-specific regional shocks. We discuss each of these in turn.

2.1

Proximity, Connectivity, and Clusters

Proximity is valuable. Being close together enables firms and households to save costs of transport, commuting, and communication. There is evidence that provision of infrastructure is much cheaper, per person, in dense urban areas than dispersed rural ones (Foster and Briceno-Garmendia 2010). Proximity also creates the potential for agglomeration economies that are generated by close and intense economic interaction. These arise through several different mechanisms. Thick labor markets enable better matching of workers to firms’ skill requirements. Better communication between firms and their customers and suppliers enables knowledge spillovers, better product design, and timely production. A larger local market enables development of a larger network or more specialized suppliers. Fundamentally, larger and denser markets allow for scale, scope, and specialization. A good example is given by specialist workers or suppliers. The larger the market, the more likely it is to be worthwhile for an individual to specialize and hone skills in producing a particular good or service. The presence of highly specialized skills will raise overall productivity. The specialist will be paid for the product or service supplied but, depending on market conditions, is unlikely to capture the full benefit created. Since the benefit is split between the supplier and her customers, there is a positive externality which creates a positive feedback – more firms will be attracted to the place to receive the benefit, growing the market, further increasing the returns to specialization, and so on. This is the classic process of cluster formation. The magnitude of agglomeration effects has been the subject of much research. A consensus view, based on studies in developed economies, suggests an elasticity of productivity with respect to output in the range 0.03–0.09; this implies that each doubling of “economic mass” raises productivity by approximately 5% (see Combes and Gobillon’s (2015) survey article). This is a large effect, suggesting that productivity in a city of 5 million is 20–30% higher than it is in a city of 200,000. Chauvin et al. (2017) find larger effects in a study of several large developing economies.

1012

G. Duranton and A. J. Venables

Economic proximity can be achieved by either density or connectivity. Density (high population per unit land area) brings with it costs of congestion (a negative externality) and other costs of close urban living; land becomes the scarce factor, and housing consumption (floor-space per household) is reduced. Improved connectivity enables economic interaction over a greater geographical distance and is achieved by transport improvements within and between cities, as discussed in Sect. 4.2.

2.2

Complementarities and Coordination

Location choices – be they by firms or by households – are typically major decisions, with large sunk costs and, if structures are built, creating long-lived assets; expectations of future returns are therefore critical. Agglomeration economies and other complementarities mean that the returns to investing in a place depend positively on who else is (or is expected to be) there. It follows that there is a first-mover problem: no one wants to move to a new place while uncertain about its future development. This creates inertia and path dependency, because firms are unwilling to move out of existing clusters; it is difficult to initiate new clusters of activity, and existing centers may therefore become excessively large. Coordinated movement by a group of firms or a “large developer” can solve this problem, but generally there is coordination failure. Thus, a planner can construct a model of a perfectly functioning new town, but there has to be a path of public and private sector investment decisions that lead from the initial situation of empty fields to the completed, and occupied, town. If such a path is not in place, then development will fail. These arguments are particularly acute in a developing country. This is partly because the economic environment is one of rapid change in all respects, including spatial and also because the cost of making a poor location decision may be higher. In a developed economy, even if a firm does not have a supplier of some specialist component in the same city, there is probably a supplier within 24 h delivery time. In a developing country this is not true, so the need for coordination – and the cost of coordination failure – is greater.

2.3

Adjustment Mechanisms

Coordination failure makes it difficult to establish alternative economic centers, but why doesn’t the price mechanism make the alternative place so cheap that firms will move there? The answer is that, within a country, the performance of a region depends largely on its absolute advantage, not its comparative advantage. If a country’s export sector has a negative shock, the adjustment mechanism is a real depreciation, i.e., a reduction in its wage and unit costs relative to its trading partners, and this reduction continues until other sectors become competitive. But if a region within a country suffers a negative shock, there may be little flexibility of relative wages between regions as labor markets are relatively more integrated. Integration is in part institutional and partly due to intra-country mobility of labor and capital that

Place-Based Policies: Principles and Developing Country Applications

1013

tends to equalize real wages across regions. Of course, there are some immobile factors, such as land and houses. Their prices will fall in the adversely affected region, but since these factors only represent a small fraction of business costs, they have little leverage in bringing areas to the point of competitiveness. Since the mechanism of real depreciation is largely absent, it follows that regional inequalities are likely to be persistent and that the market response is not to move new employment into the area, but to (eventually) move labor out.

2.4

Spatial Equilibrium

The features described above create an outcome where economic activity may be unevenly and inefficiently distributed across space. Clustering is associated with externalities (positive and negative) so the market outcome is inefficient. Coordination failure means that it is hard to start activity in a new place, this leading to excess primacy, i.e., the tendency for the largest cities in a country to be excessively large compared to those further down the urban hierarchy. We know from the data that this is a common feature of developing countries, compared to high-income countries now and also when they were at comparable stages of development (Kim 2008). Adjustment to shocks is slow and imperfect, so some regions will lag and regional income differentials may open up. A further possibility is that there may be multiple equilibria, meaning that the same economic fundamentals might support quite different patterns of regional activity. A region may be trapped in a low-level equilibrium or poverty trap, in which case it may require a large shock (or “bigpush”) to move to a different, and possibly, superior equilibrium. Since each of these cases involves market failure and inefficiency, there is a case for PBP; however, the design, implementation, and evaluation of such policy are difficult.

3

Place-Based Policies: Quantity Effects and Valuation

How does PBP change outcomes, and what is the social value of any such changes? Outcomes (“quantity effects”) occur close to the intervention but also further afield. Their value depends on the presence of market failures of various sorts.

3.1

Quantity Effects

What economic variables might be changed by PBP? There are direct quantity effects (e.g., increasing the supply of electricity in a place), but, as noted above, PBP is often intended to bring indirect effects, i.e., changes in private sector investment and employment. What determines these private sector location decisions, both in the neighborhood of a place treated by PBP and more widely in the economy?

1014

G. Duranton and A. J. Venables

Location and investment decisions may be hard for policy to change. The private sector will decide to invest in a place only if multiple conditions are met, conditions such as the natural characteristics of the place, the policy environment, and the “business ecosystem.” There is a long checklist of such conditions. Natural geography matters (e.g., proximity to ports). There are numerous dimensions of the policy environment, including the provision of infrastructure (utilities, transport), place-specific tax and regulation, and public services. Above all, there is the presence of a “business ecosystem,” meaning the network of organizations – suppliers, distributors, competitors, customers, and workers – that contribute to the performance of firms and the value of investment decisions. Does the place have a stock of firms and other productive activities, in particular its suppliers and customers? Is there a supply of workers with appropriate skills at competitive wages and able to commute to the establishment? What is the availability of other factors, such as land and capital? And how big is the market area to which the place is connected? There is a high degree of complementarity between elements of the list, meaning that if just one element is missing or inadequate, investment is unlikely to occur. This weakest link problem creates threshold effects and discontinuous responses of private investment to policy levers. For example, adding more utilities in a place may have no effect if other conditions are not present. Or, if other conditions are met, more utilities may push the place across a threshold and trigger a large private investment response. These strong complementarities therefore make it inherently difficult to predict the quantity changes that any one policy instrument might bring about. Displacement and general equilibrium effects arise as PBP may succeed in expanding employment in one area, but this might simply be a relocation of activity from elsewhere in the country. Such displacement effects can occur through several distinct routes. The most direct is competition for a particular project – such as a single factory that will operate in just one of many possible places. More generally, product market displacement occurs, particularly if demand is inelastic, as an increase in supply in one place leads to a reduction in supply elsewhere. This effect is most pronounced for non-tradable goods, where demand comes just from a local or national market. Displacement may also occur through factor markets; if there is a fixed supply of capital or full employment of a given labor force, then expansion of one activity must be accommodated by contraction of another. This may look similar to product market displacement, but its effects are quite different. It operates by raising demand for factors, this tending to increase their price and cause the exit of the lowest productivity firms. Wages, productivity, and real income therefore increase. Note also that displacements may be positive as well as negative. Consider, for instance, a new highway linking a large and a small city. This highway will make the cheaper land of the small city more easily accessible. In turn, this can lead firms in the large city to relocate to the small city. It is also possible to imagine that firms in the large city can now serve the small city at a lower cost. This could lead to the opposite effect and the concentration of more activity in the larger city. If consumers

Place-Based Policies: Principles and Developing Country Applications

1015

in both cities keep consuming the same amount, any change in the location of economic activity is pure displacement. Effective design and implementation of PBP requires that policy is coordinated across space, function, and time. A spatially integrated policy process is needed because of the risk of displacement between areas and because developing cities often grow into neighboring local jurisdictions. Policy needs to be integrated functionally, i.e., covering planning, land and building regulations, infrastructure and utility, and public service provision. And policy needs a long view, able to make credible commitments to future city development. This requires coordination and consistency between different levels of government – local, regional, and national. Competence, financial resource, and credibility require an authorizing environment more integrated than that which is typically present when formulating and implementing PBP. A further issue is whether policy should lead or lag development. Much PBP lags development, responding to bottlenecks such as congestion. Economic activity has then revealed where investment is needed, and this reduces uncertainty about the effects of policy. But lagging policy has disadvantages, putting the economy through a period when constraints are costly and incurring the costs of retrofitting infrastructure in crowded places. Alternatively, PBP may lead development in a place, potentially acting as a coordination mechanism. As examples, an infrastructure project may be a credible commitment that a place is selected for development, and construction of transport lines creates a focal point for development in a growing city. This can shape expectations and act as a catalyst to trigger private sector investment. Some PBPs are intended to achieve these effects, but such policies are inevitably more speculative than those responding to existing levels of activity.

3.2

Valuation: Market Failure

Given some knowledge about the quantity changes (direct and indirect) created by a policy intervention, what is the social value of the intervention? Valuation of the direct effects is the stuff of standard cost-benefit analysis. Outputs may be valued at market prices (as they would be in a commercial decision), or, if outputs are non-marketed, values have to be inferred (e.g., for transport investments, by estimates of the value of time). What about indirect effects, the relocation of labor and capital between places and sectors of the economy arising as policy changes private sector investment decisions? The benchmark case is that the value of these changes is zero. If the economy is efficient, then the marginal value of labor is the same in all its uses, and so too is the marginal value of capital. Moving them between uses is therefore of zero value – additional output in the new use is worth just the same as lost output in the alternative. This argument is the rationale for only looking at the direct effects of policy, as in standard cost-benefit analysis; if the economy is efficient, then other indirect effects net out to have zero value.

1016

G. Duranton and A. J. Venables

It follows that indirect effects are of net positive value only if they tend to correct inefficiencies, i.e., operate to raise productivity or to draw resources from lower to higher value use. This will be the case if there are market failures, such that the pricesystem (or some other mechanism) has not lined up marginal values of resources used in different activities and places. We have already discussed some of the market failures that arise because of agglomeration (a positive externality) and congestion (negative). Particularly important are coordination failure and the first-mover problem, causing firms and workers to continue to pile into existing clusters as no one wants to be the first to move out to an uncertain future in a new area. The equilibrium may be inefficient, with some places being inefficiently large, others too small. If policy can kick-start the development of a new area, then it may create agglomeration benefits, and also take pressure off an existing center by reducing congestion. In the developing country context, these potential inefficiencies interact with many other market failures. In the labor market, there are wide gaps between the productivity of workers in formal sector employment and in the informal sector or in agriculture; Gollin et al. (2014) find that the productivity of labor in non-agriculture is between two and three times higher than it is in agriculture. Human (and social) capital acquisition is limited, often much more so in lagging than in fast-growing regions. And in the land market, investment in buildings may be inhibited by lack of clarity in property rights or inappropriate building regulations. This is exacerbated by capital market failures and, in many developing countries, the absence of a mortgage market.

3.3

Policy Targeting

PBP should be based on a systematic evaluation of both of these stages – establishing the likely quantity effects of a policy – and then asking whether these effects are of net positive social value. From the efficiency standpoint, this generally means addressing and quantifying the market failure that the policy is intended to address. The theory of policy targeting tells us that the fundamental determinants of market failure should be diagnosed and then addressed by policy targeted as precisely as possible on these determinants. Thus, if unclear land rights hamper building, then the first-best policy is to clarify these rights, and so on. PBP is likely to be second (or n-th)-best policy; policy makers should be aware that better policies, more closely targeted on the market failure, may exist. The presence of multiple market imperfections complicates targeting, as a particular PBP may change many quantities and interact with many market failures, perhaps not being the first-best response for any of them. Despite these complexities, policy should be based on a rigorous diagnosis of the market failures it is supposed to address and appraised by realistic assessment of the quantity changes it is likely to create and the social value of such changes.

Place-Based Policies: Principles and Developing Country Applications

4

1017

Place-Based Policies: Zones, Corridors, and Regions

We now turn to specific place-based policies, focusing on PBPs that are designed principally to achieve “indirect effects”: special economic zones (SEZs), corridors and long-distance transport improvements, lagging regions, and local economic development policies. Discussion of each is framed in terms of their effect on quantities and the social value of any such induced quantity changes.

4.1

Special Economic Zones

We define special economic zones broadly to encompass free-trade zones, exportprocessing zones, or any district targeted with favorable fiscal or institutional treatment. The principle objective of SEZs is to attract investment and create jobs. Ideally, this is investment in internationally footloose activities that would not have otherwise come to the country and hence jobs that are additional not just displaced. Successful SEZs will link to the rest of the economy and stimulate growth more widely, with the ultimate objective of convergence between well-performing zones and the rest of the country. Further benefits might flow from supply of foreign exchange (in the investment itself and through the sale of output) and from the fact that SEZs can be a way for governments to experiment and find out what policies work. SEZs employ a range of policies in a well-defined geographical area or areas. They are of distinct types, involving infrastructure provision (especially power and transport); provision of land; distinct regulatory regimes, often involving laxer labor regulations and restrictions on trade union activity; and tax incentives, including duty free processing zones and holidays from corporate income taxes. Management of SEZs often works closely with private sector investors to facilitate investment and ease bottlenecks. The economic case for pursuing these policies in tightly defined geographical areas rests on two arguments. One is “first-best” and based on the economic efficiency gains derived from spatial concentration in the provision of infrastructure and development of clusters, as discussed above. The other is “second-best,” based on the presence of institutional and financial constraints that create economic inefficiencies. These could, in principle, be removed economy-wide. But in practice this is infeasible, partly because of their fiscal cost (tax and customs regulation, infrastructure) and partly because of the political obstacles that would be encountered (e.g., in acquiring land and implementing regulatory reform). Quantity effects – attracting investment: As outlined in Sect. 3.1, success in attracting investment requires that a wide range of conditions be met, covering geography, policy, and the business ecosystem. There is a weakest link problem, and many SEZs have failed because key elements of the package are absent. First, SEZs need to be located in places consistent with their objectives and longrun economic viability. If they are export oriented (or import dependent), they need

1018

G. Duranton and A. J. Venables

to have good access to port infrastructure. In countries where even well-located regions have difficulty attracting investment, SEZs in backward or remote regions are unlikely to succeed. The economic size of the SEZ (to reap scale and agglomeration economies) and of the area where it is located (to provide a local labor market and depth of local firms) are important factors. Their importance is confirmed in Farole’s (2011) study of African SEZs. However, finding a “good location” is necessary but not sufficient for success. The success of an SEZ will also be determined by its comparative advantage, so unrealistic sectoral selection will lead to failure. There is considerable evidence that tax incentives alone are insufficient for success. Assessing the marginal impact of one policy is difficult given the complementarities between them and the country context (policies in place outside the SEZ). Nevertheless, Farole (2011) looks at data across 77 countries and finds that infrastructure and trade facilitation have a significant positive impact, while tax and other financial incentives are much less important. Effective implementation of policy matters. This requires action that is coordinated across functions (tax, land, infrastructure) so needs the organization running the SEZ to be empowered to deliver these functions. There must also be credible commitment to policy for many years ahead. Taken together, these considerations mean that commitment is needed from the highest level of government, while at the same time the SEZ authority needs to be responsive to the concerns of firms in the zone. Linkages and growth: A successful SEZ will have an internal dynamic of spillovers between firms, agglomeration, and productivity growth. This will have a horizontal element, with a large number of firms in the same sector building up thick labor markets and other agglomeration economies, and a vertical element, with co-location of input suppliers and the growth of forward and backward linkages. This process encounters the first-mover coordination problem discussed above – it is hard to start a cluster. Involvement of one or several large firms is one route to kickstart this, as with the multinational electronics companies (including AMD, Fairchild semiconductor, Intel) initially attracted to Penang, Malaysia, or Philips-vanHeusen’s project in Hawassa, Ethiopia. Attracting such companies requires that government works closely with the companies and commits to deliver international standards. Links from the SEZ to the local economy include development of skills in the local labor market; expanding the technological capabilities of local firms; increasing use of local firms as suppliers and as customers; and entrepreneurial spinoffs from firms in the zone. Successful SEZs have seen an increasing fraction of activity in the SEZ being undertaken by local firms, this sometimes occurring as part of a maturing and upgrading process. In Mauritius the SEZ upgraded from low-value textiles to higher value and more skill-intensive products (off-shoring low-value production to the SEZ in Madagascar). In Malaysia the Penang SEZ focused from the start on electronics but upgraded from basic assembly to more advance and skillintensive goods. Both these sectoral transitions were accompanied by a transition toward locally owned firms.

Place-Based Policies: Principles and Developing Country Applications

1019

The role of government in this process is important and needs to be based on recognition that there are mutual benefits – for firms in the SEZ and for the local economy – from developing these spillovers. Thus, rigid domestic content requirements are likely to be viewed as a cost to firms in the SEZ and may transfer little learning to firms outside. But working to bring local firms up to the level where they are chosen suppliers is of mutual benefit. The knowledge transfer is also of value to government itself, as SEZs can provide a vehicle for learning about what makes an effective business environment. China explicitly used SEZs as vehicles for policy experimentation. The value of SEZs: The costs and benefits of an SEZ depend on the quantity responses elicited and on displacement – the extent to which investments and jobs created are additional to those that would have occurred absent the policy. The value of jobs created depends on the state of the local labor market and the alternative sources of employment, as we saw in Sect. 3. Linkages to the local economy should be included to drive the net number of jobs in the economy relative to a situation without the SEZ policy. Benefits accrue directly through (net) job creation in the SEZ and also through potential impacts on the wider economy. One mechanism is sheer scale: in Bangladesh and Mauritius, the scale of job creation (in the SEZ, in suppliers, and via spending from wages) surely raised incomes not just of those employed in the SEZ but through tightening the labor market throughout the country, raising wages and ultimately improving the country’s terms of trade. Other mechanisms operate through raising skills and capabilities of workers and firms both inside and outside the zone and through the consequent dynamics of productivity growth and increasing competitiveness in international markets. The gains are potentially substantial, but as suggested above, achieving them requires meeting substantially all of a large set of conditions. Finally, the costs of the policy depend on the set of instruments used. Tax breaks appear expensive but have to be compared with revenue that would have been earned absent the SEZ; compared to this counterfactual, they are costly only if they divert taxpaying firms into the zone rather than create new investment in the zone. Infrastructure investment is riskier, since costs are incurred at early stages of development, while benefits depend on the success of the SEZ. Regulatory innovation and soft policy is a low-cost policy from which the government learns, even in the event of failure.

4.2

Corridors and Long-Distance Transport Infrastructure

The development of medium- or long-distance transport improvements typically has the objectives of reducing costs for users of the route and stimulating economic activity at places along the route. Analysis of whether these objectives are likely to be achieved by a particular project is hard. No one would doubt that a completely isolated place will be poor or that most rich places are well connected. But it does not follow from these observations that all well-connected places are rich or that

1020

G. Duranton and A. J. Venables

improving connectivity necessarily brings development. Furthermore, changes in economic geography brought about by transport improvements might mean that some places gain, others lose. Transport and quantity changes – theory: What are the effects of reducing the costs of flows of services, goods, and people in and out of an area? Several arguments suggest that a transport improvement is a double-edged sword which can make a place either more or less attractive as a location for production and employment. One obvious tension arises as transport both improves access to export markets and opens the local market to import competition; export sectors will benefit, but import competing sectors will suffer. Another comes from the interplay between market access and production costs. A number of models find that reducing transport costs from a high level to an intermediate level is a source of divergence (one of the connected points may expand at the expense of others), while reducing transport costs further gives the opposite result, leading to convergence (see Fujita et al. 1999 and Combes et al. 2005). The ambiguity hinges on there being some lumpiness or increasing returns in investment decisions. In this case, when transport costs are very high, each place will have some production to supply its local market. When costs come down, places can be supplied from a single center, and firms will choose the place which has the best market access (i.e., is best able to reach consumers across the area). But when transport costs become very low, these market access arguments become unimportant, and production cost differences become the most important consideration. Reducing transport costs to this point may therefore cause firms to move out of a congested center to more peripheral regions. The more acute these arguments become, the greater is the mobility of people and firms. As firms move so the “business ecosystem” becomes undermined; and since market size matters for firm location, a region that is losing population will become ever less attractive for firms. There may be threshold effects, so change may be sudden. Falling trade costs may enable new clusters develop and old ones unravel, a phenomenon seen clearly in the impact of international trade and globalization on many established centers of manufacturing production (Fujita et al. 1999). Transport and quantity changes – empirics: Despite the central importance of the issue and much attention devoted to it in empirical work, our empirical knowledge is still extremely scarce, and perhaps the precious little that we know is of limited applicability to new projects. The first reason for this is the ambiguity discussed in the preceding subsection. Even simple models suggest that effects can vary greatly according to details of the context, in view of which it is not surprising that an empirical consensus has not emerged. Real-world contexts are inevitably much more complex than theory modeling, not least since the theory is often based on just two locations. This simplification is often needed to generate results and is sometimes justified when thinking, for instance, about broad patterns of economic integration between a center and a periphery. Unfortunately, this two-location simplification is far from innocuous in our context. In a two-location model, employment growth in one location can only come at the expense of the other. In reality, we need to think about locations treated

Place-Based Policies: Principles and Developing Country Applications

1021

by a transportation project and locations not directly treated but which may nonetheless be affected as firms and workers may relocate to the treated location. We should expect highly heterogeneous outcomes when lowering shipping costs across places. Instead, most of the research has attempted to estimate average effects. The standard approach taken by research is a regression of a change in outcome such as local employment or productivity on a change in infrastructure (or sometimes an initial level of infrastructure, a valid approach when adjustments are slow). Even using this standard approach, further difficulties are encountered. One is simultaneity. Rational infrastructure planning requires placing infrastructure in areas where it has most impact. This obviously leads to a spurious correlation between the outcome of interest and the placement of infrastructure. Alternatively, political realities probably imply that public works take place in locations hit by negative shocks. To get around this problem, the literature has developed a number of empirical strategies, such as relying on the examination of “in-between” locations along corridors that were incidentally served, engineering/cost predictions, or old infrastructure/plans developed under very different circumstances to generate some quasi-random variation for the corridors that were developed (see Redding and Turner (2015) for a thorough discussion of these issues). Another fundamental problem is distinguishing between displacement and net growth; if a treated place does better than an untreated one, it could be because of pure displacement, with no net benefit. To be able to distinguish, the researcher needs three groups of units: a treatment group receiving direct access to the infrastructure, an indirectly treated group that may have suffered from displacement, and a control group of truly unaffected locations. Treated areas are relatively easy to define; for a new train line, there may be cities with new stations. Indirectly treated areas are harder to define; there may be cities that did not receive a new train station and are neighbors of treated cities, acknowledging that there are many shades of neighbors. The true difficulty is finding the valid control group. Assuming displacement effects decline with the distance to the treatment – which seems like a reasonable assumption although still an assumption – one would like a control group made of really remote cities. At the same time, these remote cities are likely to be different from the treated cities. Imagine, for instance, a road improvement scheme in the island of Java in Indonesia that affects half the cities. Given the small size of the island, the other half is likely to be subject to displacements. For a control group, one will probably need to examine cities in the island of Sumatra, but whether such cities form an appropriate control group is unclear. They are likely to be subject to different dynamics. It is also perhaps hard to rule out the absence of displacements from Sumatran cities. Notwithstanding these difficulties, there is now a large empirical literature on corridors – one too large to be reviewed comprehensively. The interested reader should refer to the reviews by Redding and Turner (2015) and Berg et al. (2017), the latter focused on developing countries. The literature points to four main findings so far. First, corridors tend to attract economic activity, and at least some of this is driven by displacement from locations more remote relative to the infrastructure. Second, transport infrastructure also tends to promote the decentralization of

1022

G. Duranton and A. J. Venables

economic activity within a corridor area away from the main centers. This dispersion of activity is nonetheless far from uniform along corridors. Third, corridors appear to promote various efficiency gains through higher productivity and less factor misallocation. Fourth, corridors affect several margins including the aggregate amount of economic activity in a location, its distribution across sectors, its distributions across skills and functions (production vs. management, for instance), and participation in external markets (labor markets in other locations or agricultural markets when moving away from subsistence farming). Given the methodological problems noted above, these findings should be viewed as tentative. They also reflect broad trends in the data but should not be expected to hold every time given the heterogeneity associated with relocations. Valuing transport improvements: Transport improvements bring direct benefits of cost reduction and time savings. If they also trigger private sector investments and other “indirect quantity” effects, what is the additional social value? Relocation of economic activity toward relatively poor areas brings distributional benefits and possibly associated political gain. These are important, but we should also ask whether there are aggregate gains to the country as a whole, and the answer depends on the interaction between the quantity changes and market imperfections. Several such arguments are noteworthy. One is based on the benefits of proximity and connectivity, including agglomeration externalities. Transport improvement has the effect of making places closer together in economic, if not geographic terms. Thus, if there are positive agglomeration externalities, transport improvement will tend to raise productivity in the area, and this is a source of net real income gain. A second argument turns on the potential of transport improvement to causes major relocation of activity and stimulates the development of new centers. We discuss this in the next subsection, in the context of “big-push” policies.

4.3

Lagging Regions and Big-Push Policies

Some regions lead development and others lag; indeed, it would be remarkable if all regions were to develop at the same rate. In the absence of policy, what happens to regions that are lagging? They often catch up as activity spreads out of an economic core – or spreads inland from the coast – as is documented in World Bank (2009). But in some cases, they fail to converge in this way. Many countries have regions with chronic problems, such as Brazil’s Nordeste, China’s Xinjiang region, or India’s state of Bihar, to name just a few. What are the economic issues that prevent catch-up from being the usual outcome? As discussed in Sect. 2, factor mobility and intra-country institutional factors limit the magnitude of interregional factor price differentials, implying that the principal mechanisms of comparative advantage do not operate. Adjustment then takes the form of labor moving out of lagging areas in which case, since there is no market failure directly associated with this mechanism, it is efficient to let the region contract. The other economic issue preventing catch-up is the propensity of economic activity to cluster and the difficulties of starting new centers. As we saw in

Place-Based Policies: Principles and Developing Country Applications

1023

Sect. 2, the productivity gains from clustering are a positive externality giving rise to coordination failure and the first-mover problem and therefore suggesting a case for policy to decentralize activity from a congested city to a secondary or satellite-city. Given this, what is the case for policy intervention? It typically rests on one or more of the following arguments. First, the negative impact on people left behind. This can be acute in areas of absolute decline where out-migration of the young (and possibly more skilled) leads to demographic imbalance and severe social deprivation. More generally, persistent inequalities raise issues of spatial equity and consequent distributional conflicts and other political ailments. Second, the argument is made in terms of excess (or over-rapid) expansion of booming areas, leading to congestion and pressure on housing and other assets. Third, some particular market failures may be holding a region back. Although these market failures may occur everywhere, there may be particularly harmful in lagging regions. A stronger version of this argument is that lagging regions may be stuck in a poverty trap. Policies for lagging regions: Interventions are often part of ambitious multiinstrument programs that include (a) transport investments to improve connections within lagging regions and between lagging and more prosperous regions; (b) fiscal incentives and various direct service provisions; and (c) a package of measures that aim to foster skills, enterprise development, and innovation in specific parts of a country (local economic development policies). We focus our attention in this section on the use of multiple and potentially complementary instruments in ambitious regional development programs that attempt to make a large difference to economic outcomes reasonably fast, often referred to as “big-push” policies. The following subsection will discuss local economic development policies. Before going further, we make several important notes of caution. The first is that knowledge about what works is weak. Second, prospects depend on many characteristics of the region that are not amenable to control by economic policy measures. Third, for treatment to be effective, it needs to be geographically selective. It follows that relative decline of some areas will, in many countries, be an inevitable part of national economic development. Complementary policies and the big push: The usual justification for large comprehensive packages lies in either strong complementarities between policy instruments or the existence of local poverty traps (and often both). For instance, providing a transport link to a peripheral region in a developing country may not generate positive effects if local producers are unable to benefit from better access to new markets. This transport improvement thus needs to be supplemented by capacity building for local firms. More generally, in economies plagued by numerous market failures and inefficiencies, fixing a single problem does not offer a guarantee of improvements. This is the classic second-best argument of Lipsey and Lancaster (1956). To take a simple example, tolling a major road to avoid congestion may push traffic to secondary roads and increase the number of accidents. The social loss caused by more accidents may dominate the congestion gain, resulting in a net social loss. The main challenge with policy packages that hope to build on complementarities between instruments is that they require an extremely detailed understanding of what

1024

G. Duranton and A. J. Venables

the frictions and market failures are and how they interact. In some cases, these complementarities seem obvious. For instance, both electricity and market access are necessary for an export-processing zone to be successful. Other interactions are much less well understood, including those that take place between hard infrastructure such as transport improvements and softer interventions such as local economic development (LED) policies. While being deep in the second-best offers no guarantee that remedying one market failure will generate an improvement, neither does it guarantee that fixing two market failures will do better. Turning to multiple equilibria, we note that being able to move from a low to a high equilibrium sounds extremely attractive since a temporary intervention may be able to make permanent change. Even more attractive, this move could potentially be achieved at relatively low cost even when the high equilibrium is much more desirable than the low equilibrium. This is the logic that some policy makers appear to have in mind when they combine some local capability developments, support for higher education, and relocation incentives to technology firms in the hope of creating a “transformative” high-tech cluster. A key question with multiple equilibria is how to move from one equilibrium to the other. In the absence of friction, a shock is needed to take the economy away from the basin of attraction of the low equilibrium into the basin of attraction of the high equilibrium. With transitional frictions such as the cost of rural-urban migration or the inability of an industrial sector to absorb new workers fast, the situation is more complicated. The frictions may be large enough to trap the economy in the low equilibrium. When frictions are less, expectations become fundamental (Krugman 1991; Matsuyama 1991). This opens a role for the government to coordinate expectations, overcoming the firstmover problem and facilitating private sector investment. Evidence on big-push policies and multiple equilibria – quantity and valuation effects: Examples of big-push interventions from the developing world abound. For instance, the Upper Egypt Local Development Program targets two provinces with a variety of interventions involving infrastructure development, the construction of special economic zones, and a variety of support programs for both private and public sectors. Argentina has recently launched its Plan Belgrano for its ten (lagging) northern provinces with a mix of infrastructure, private sector development measures, housing construction, and increased childcare provision. While ambitious in its purpose, the investments proposed by the Plan Belgrano represent only about 0.25% of the country’s GDP annually for a period of 10 years. This type of wide-ranging intervention is extremely hard to assess empirically. A first possibility is to try to assess every single instrument individually, but this would be daunting. This would also be contradictory since the rationale for these instruments relies precisely on the complementarities between them. A second possibility is to take a more aggregate approach and only examine final outcomes such as overall employment or GDP per capita in the treated regions relative to the untreated. The problem is then that these aggregated outcomes may have been affected by other aggregate changes in the economy unrelated to the policies at hand. Displacement will also affect “untreated” regions. Regardless of the approach taken, the difficulties of the evaluation are further compounded by the wide geographical dispersion of the

Place-Based Policies: Principles and Developing Country Applications

1025

investments, the modest amounts being invested, and the ongoing nature of many of these policies with few abrupt changes. This is arguably why there is little research on these projects and why most of extant research is struggling to provide solid conclusions (Neumark and Simpson 2015). The main exception is the wide-ranging evaluation of the Tennessee Valley Authority (TVA) program by Kline and Moretti (2013b). Like the interventions described above, the TVA had several components, mainly energy generation (through the construction of numerous dams), transport (through the development of roads and canals), and education (through the construction of new schools). For evaluation purposes, the TVA has the advantage of being well circumscribed in time (starting in 1933 but with the bulk of the investments made in the 1940s and 1950s) and geographically (163 counties across four states in the Appalachia region). While small relative to the scale of the US economy, the transfers were substantial for the treated counties, up to 10% of local incomes at the beginning of the 1950s. Kline and Moretti (2013b) find that, relative to their control group, TVA counties enjoyed higher growth in manufacturing employment of 5–6% per decade and lower growth in agricultural employment. Much of the growth in manufacturing occurred during the treatment period, while the decline in agricultural employment occurred after 1960 and the end of transfers. While median family incomes in TVA counties increased by about 2.5% per decade relative to control counties, manufacturing wages did not increase. This suggests that the main effect of the program in the treated counties was to industrialize them by shifting labor away from agriculture into manufacturing where wages were higher. An important caveat when doing an aggregate evaluation of this type is that displacement effects cannot be tracked directly. Just as with corridor projects, one would need to observe “control” counties had the TVA program not taken place, which is of course impossible. With this problem in mind, Kline and Moretti (2013b) impose some theoretical structure to interpret their findings and indirectly estimate displacement effects. A key challenge for future research will be to assess the sensitivity of this type of analysis to the details of the model at hand. Importantly, Kline and Moretti (2013b) show that the effects on manufacturing employment of the TVA program can still be observed in 2000, about 40 years after federal transfers stopped. This suggests a change of equilibrium where a permanently higher manufacturing specialization can be sustained through agglomeration effects. In conclusion, the empirical evidence invites caution with respect to big-push type of initiatives. First, existing equilibria often appear extremely resilient so that quantity changes are hard to generate through a change of equilibrium. Second, the absence of “non-linearities” in agglomeration effects is consistent with quantity changes being of low, perhaps zero, values.

4.4

Local Economic Development Policies

The usual justification for local economic development interventions and support for innovation is that some regions have insufficient “capacities” because of lack of

1026

G. Duranton and A. J. Venables

investment. The objective of the policy is then to foster capacity development either directly through the provision of skills and advice to firms or indirectly through fiscal and monetary incentives. The key justification for investment subsidies or direct investment by the government lies in the existence of a wedge between private and social returns to investments. For instance, workers may underinvest in skills and education because of human capital externalities. In developing countries, credit constraints may further impair investments in education and skills. A lack of information may also be invoked. For innovation, the justification is even more commonly accepted given the public good nature of new knowledge. A clear limitation to this type of policy is that underinvestment in skills and various forms of capital may be a symptom of a deeper problem, rather than the problem itself. Economic agents choose to invest little when private returns are low, but these returns may be low for many different reasons. In some cases (such as human capital externalities), private returns may be low, while social returns are high. In other situations (such as when jobs are allocated along group membership), both private and social returns are low. In these cases, subsidizing investment in skills is wasteful as the fundamental problem is elsewhere. The general point is that the appropriate subsidy requires diagnosing what the market failure is and thereby knowing the wedge between private and social returns. Diagnosis of market failures may be hard, as they often arise in related markets. An example is enterprise development and entrepreneurship decisions. Consider a simplified situation where an efficient allocation of talent requires that the most talented people become entrepreneurs. However, entrepreneurship is highly risky, and there is no insurance market for entrepreneurial risk. This leads to too few entrepreneurs and misallocation where highly talented but also highly risk-averse individuals may choose to remain in occupations where their contribution to social welfare is low. In developing countries, other frictions leading to potentially severe misallocation loom large. Credit constraints are one of them, and so too area cultural and religious traditions that reserve occupations to some particular groups and bar others. While a strong case can be made for subsidies to skill acquisition, enterprise development, and innovation, complications arise when thinking about such policies at the subnational level. First, why should those policies be conducted locally and/or for only a part of a country? The market failures we just described are general. While it can be argued that specific contexts may call for specific responses, this does not justify a specific focus on lagging regions. It could be that underinvestment in capacity is worse in lagging regions, but this needs to be justified and explained; we may expect social returns to investment in skills to be larger than the private returns everywhere in a country. It is unclear why this wedge between private and social returns should be larger where the private returns are the lowest. There are situations where this may be the case. For example, similar matching frictions on the labor market will disproportionately affect regions where productivity is lower, and this may justify labor market policies specifically targeted to less productive regions (Kline and Moretti 2013a). Credit constraints may affect entrepreneurs and enterprise development disproportionately in poorer

Place-Based Policies: Principles and Developing Country Applications

1027

regions, but there are also situations where the opposite may hold, and more subsidies may be needed in richer regions. For instance, subsidies to innovation will arguably be much more effective when distributed to firms that are at the technological frontier than when given to firms that are well-within that frontier. The broader question behind this discussion is to what extent different policies are needed in different subnational contexts relative to national policies for which one size should fit all. Human capital and enterprise development in lagging regions: Research is strongly suggestive that more human capital has large local productivity effects that can be tracked through higher wages and higher population growth. Most of this work has been done in developed economies, but research in developing countries is suggestive of similar effects possibly with larger magnitudes for China, India, and much of Latin America (Duranton 2016; Chauvin et al. 2017; Ferreyra 2017). For LED policies, these findings are good news since relatively small changes in the composition of human capital are potentially at the source of large changes in wages and employment. This is only half of the story however. The other half regards the ability of LED policies to change the skill-composition of the workforce, and here the news is much less positive. There is a large literature looking at human capital policies at the country level which is beyond our scope. There is also a large literature that evaluates a wide variety of training programs, primarily in developed countries but also sometimes in developing countries. While there are a variety of results, overall the returns to training and other localized human capital interventions are found to be low and even negative some of the time (Card et al. 2010). A particular concern when focusing on human capital policies conducted locally is the possibility for educated or trained workers to migrate away. While outward mobility is likely to attenuate greatly the quantity effects of local human capital policies locally, there is clearly social value in raising the human capital of workers and their ability to produce, wherever they may go. Put differently, with local human capital interventions, place-based objectives, and (people-based) returns may be at odds. This said, while we do not negate the issue, we note that much of the development literature often highlights how small internal migration flows are in many developing countries (e.g., Munshi and Rosenzweig 2016). Turning to entrepreneurship and enterprise development, there is again research that is strongly suggestive of positive effects of local entrepreneurs on local growth. Again, much of that evidence comes from developed countries and less is known from developing countries. Support for a positive role of entrepreneurship on subsequent local growth is provided by Ghani et al. (2013) for India and Duranton and Martin (2018) for Colombia. Just as with human capital, a case can be made that more entrepreneurship may help lagging regions. The issue is then whether entrepreneurship and firm development can actually be fostered in developing countries, especially in their lagging regions. Evidence on the subject is sparse, although McKenzie and Woodruff’s (2014) careful review of private sector support in developing countries is not particularly encouraging. The effects of the enterprise development schemes that have been assessed are often insignificant because they are

1028

G. Duranton and A. J. Venables

based on samples that are too small. While there are cases of interventions that appear to have made a difference for the treated firms such as the intense involvement of management consultants with a small number of Indian textile firms reported by Bloom et al. (2013), it is difficult to be optimistic in the vast majority of the cases reviewed by McKenzie and Woodruff (2014). After some initial changes in their operations, treated firms often revert to their old ways. Even in cases where firms operate substantially and permanently better after some training, it is unclear whether the economic gains for the treated firms are worth the cost of the intervention as it is hard to track operating profit of firms. Just as with other place-based policies, displacement is also a serious concern. A large growth in revenue of the treated firms may be associated with only minimal productivity gains allowing those firms to expand at the expense of others in highly competitive markets. A training program may even generate negative benefits if the wrong firms are treated. Beyond these considerations, it is also often unclear which market failure is being solved when providing firms with some free training that they would otherwise not purchase. Self-employed workers may of course be credit-constrained, but then the better response should be to tackle credit constraints in the first place. While we appreciate that developing economies operate deep into the second-best, it is nonetheless fundamental to assess the main market failure at play. Besides credit constraints, another possible reason for firms not purchasing training may be a lack of information about the benefits from training. If this is the case, informing firms about the benefits from training may be immensely cheaper than providing this training for free, and it may lead to the provision of more relevant training subject to a market test. From this we conclude that while changes in human capital, entrepreneurship, and firm management in an area can have sizeable effects on final outcomes of interest such as productivity and employment, the literature is not encouraging when it comes to the ability of policies to foster these changes. In a way, our conclusions regarding human capital and enterprise development are opposite to the conclusions we drew regarding corridors. Corridors often generate large quantity responses, but their social value is unclear. Human capital and enterprise interventions may be of a high social value, but changing quantities appears extremely hard.

5

Concluding Comments

Policy makers, particularly those in developing countries, have to make place-based decisions about infrastructure and service provision and are faced by challenges of uneven development and lagging regions. These decisions should be informed by economic appraisal based on rigorous research, but research in this area is challenging. Analytically, it is clear that there are many market failures and a potential for use of policy. But the effects of policy are sometimes ambiguous in theory and always depend on details of the local situation. Even if the quantitative effects can be established, their valuation requires understanding the nature and extent of inefficiencies and market failures in the local and national economy. Study of place-based

Place-Based Policies: Principles and Developing Country Applications

1029

policy is a booming area of empirical research, but this is even more challenging, as it is hard to establish counterfactuals and identify the causal effects of policy. These conclusions should not, however, be taken as implying that we can say nothing. Too often in this area, policies have gone unchallenged and large sums of money have been wasted. A rigorous framework is needed in which the range of likely effects can be explored and arguments for benefit can be challenged. Empirical work is gradually building a stock of knowledge, about what does not work, as well as what does.

6

Cross-References

▶ Clusters, Local Districts, and Innovative Milieux ▶ Demand-Driven Theories and Models of Regional Growth ▶ European Regional Policy: What Can Be Learned ▶ Geographical Economics and Policy ▶ Infrastructure and Regional Economic Growth ▶ Regional Growth and Convergence Empirics ▶ Special Economic Zones and the Political Economy of Place-Based Policies ▶ The Case for Regional Policy: Design and Evaluation

References Berg C, Deichmann U, Lieu Y, Selod H (2017) Transport policies and development. J Dev Stud 53 (4):465–480 Bloom N, Eifert B, Mahajan A, McKenzie D, Roberts J (2013) Does management matter? Evidence from India. Q J Econ 128(1):1–51 Card D, Kluve J, Weber A (2010) Active labour market policy evaluations: a meta-analysis. Econ J 120(548):F452–F477 Chauvin JP, Glaeser E, Ma Y, Tobio K (2017) What is different about urbanization in rich and poor countries? Cities in Brazil, China, India and the United States. J Urban Econ 98(C):17–49 Combes P-P, Gobillon L (2015) The empirics of agglomeration economies. In: Duranton G, Henderson V, Strange W (eds) Handbook of regional and urban economics, vol 5A. Elsevier, Amsterdam, pp 247–348 Combes P-P, Duranton G, Overman HG (2005) Agglomeration and the adjustment of the spatial economy. Pap Reg Sci 84(3):311–349 Duranton G (2016) Determinants of city growth in Colombia. Pap Reg Sci 95(1):101–132 Duranton G, Martin D (2018) Entrepreneurship, specialization, diversity, and sectoral growth in Colombian cities. Mimeographed in progress, Wharton School of the University of Pennsylvania Farole T (2011) Special economic zones in Africa: comparing performance and learning from global experience, no. 2268 in World Bank publications, The World Bank Ferreyra MM (2017) Human capital in cities. World Bank, Mimeographed Foster V, Briceno-Garmendia C (2010) Africa’s infrastructure; a time for transformation. World Bank, Washington DC Fujita M, Krugman P, Venables AJ (1999) The spatial economy; cities, regions and international trade. MIT Press, Cambridge MA

1030

G. Duranton and A. J. Venables

Ghani E, Kerr WR, Tewari I (2013) Specialization, diversity, and Indian manufacturing growth. Mimeographed, Harvard Business School Gollin D, Lagakos D, Waugh ME (2014) The agricultural productivity gap. Q J Econ 129 (2):939–993 Kim S (2008) Spatial inequality and economic development: theories, facts, and policies. In: Spence M, Annez PC, Buckley RM (eds) Urbanization and Growth. World Bank, Washington, DC Kline P, Moretti E (2013a) Place based policies with unemployment. Am Econ Rev Pap Proc 103 (3):238–243 Kline P, Moretti E (2013b) Local economic development, agglomeration economies, and the big push: 100 years of evidence from the Tennessee Valley Authority. Q J Econ 129(1):275–331 Krugman P (1991) History versus expectations. Q J Econ 106(2):651–667 Lipsey RG, Lancaster K (1956) The general theory of second best. Rev Econ Stud 24(1):11–32 Matsuyama K (1991) Increasing returns, industrialization, and indeterminacy of equilibrium. Q J Econ 106(2):617–650 McKenzie D, Woodruff C (2014) What are we learning from business training and entrepreneurship evaluations around the developing world? World Bank Res Obs 29(1):48–82 Munshi K, Rosenzweig M (2016) Networks and misallocation: insurance, migration, and the ruralurban wage gap. Am Econ Rev 106(1):46–98 Neumark D, Simpson H (2015) Place-based policies. In: Duranton G, Henderson JV, Strange WC (eds) Handbook of regional and urban economics, vol 5B. Elsevier, Amsterdam, pp 1197–1287 Redding SJ, Turner MA (2015) Transportation costs and the spatial organization of economic activity. In: Duranton G, Henderson V, Strange WC (eds) Handbook of regional and urban economics, vol 5B. Elsevier, Amsterdam, pp 1339–1398 World Bank (2009) World development report 2009: reshaping economic geography. World Bank, Washington, D.C.

New Trends in Regional Policy: Place-Based Component and Structural Policies José-Enrique Garcilazo and Joaquim Oliveira Martins

Contents 1 Introduction: Regional Disparities and Place-Based Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Three Main Regional Scales and Their Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Tertiarization of Economies and Competitive Edge of Large Cities . . . . . . . . . . . . . . . . 2.2 Rural Regions Close to Cities and Agglomeration Effects . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Heterogeneity of Remote Rural Regions and Tradable Activities . . . . . . . . . . . . . . . . . . 3 Differentiated Regional and Urban Policy Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Policy Responses in Cities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Policy Responses in Rural Areas Close to Cities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Policy Responses in Remote Rural Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Conclusion: An Evolving Regional Policy Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Annex: A Typology to Classify TL3 Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1032 1035 1035 1038 1040 1044 1044 1046 1047 1047 1050 1050 1050

Abstract

This chapter describes and discusses a regional development policy approach for national productivity and growth based on three regional layers: (i) large metropolitan areas; (ii) rural/intermediate regions close to cities; and (iii) remote rural regions. First, we analyze the trends of tertiarization of OECD economies and their relation to the increased dominance of large cities. Over the long run, the economic space has been progressively structured around these large cities, their hinterlands, rural areas close to cities, and the rural remote areas. The chapter then focuses on the appropriate regional policies for each of the regional layers. J.-E. Garcilazo Centre for Entrepreneurship, SMEs, Regions and Cities, OECD, Paris, France e-mail: [email protected] J. Oliveira Martins (*) Centre for Entrepreneurship, SMEs, Regions and Cities, OECD, PSL-University of Paris Dauphine, Paris, France e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_139

1031

1032

J.-E. Garcilazo and J. Oliveira Martins

Against this background, in a final section, the chapter discusses the evolution of the place-based regional policy paradigm in the OECD, in particular its recent focus towards wellbeing. Keywords

Place-based policies · Regional and urban growth · Lagging regions

1

Introduction: Regional Disparities and Place-Based Policies

Understanding economic and social trends at the national level, their drivers, and bottlenecks has been the focus of much of the economics literature since the post war period and this in turn has dominated much of the policy debates behind economic development. Yet national average figures mask important differences within countries. For instance, the national unemployment rate in Spain, which by 2016 stood at 13%, does not convey any information depicting the important differences in labor market conditions across regions ranging from a low unemployment rate of 4% in Navarra to the highest rate of 25% in Andalucía. These differences reflect a better functioning labor market in Navarra, which will likely lead to higher productivity rates, wages, and quality of life than Andalucía. National or uniform policy responses without understanding the main drivers and bottlenecks behind these differences may fall short in providing efficient responses. On the theoretical front, advancements in the New Economic Geography (NEG), the urban agenda, and the new trade theory have given important insights on explaining why economic activity and settlement patterns tend to concentrate in specific geographies in countries and not in others. The NEG provides a theory on how and why economic activity concentrates in certain locations, which generates core-periphery spatial patterns. The model is based on a spatial equilibrium between benefits and costs of agglomeration. Over the long run, it can explain how economic divergence may steadily increase. Other effects related to endogenous growth and institutional economics may reinforce these spatial outcomes (Acemoglu and Dell 2010). At the more normative level, national policy makers have given more attention to territories and better understand regional variations, beyond national averages, their relation to national performance, and their main drivers. In turn, this has triggered motivation in the economic literature to understand and provide a rationale for spatial development policies, which are tailored for specific places (e.g., regions and cities) or geographies. These policies are usually called place-based policies (hereafter, PBPs). They are in contrast to the traditional structural policies (product, labor markets, education, health, etc.) that are designed to increase the aggregate growth potential of an economy and typically have a uniform or national-wide coverage. The debate about spatially blind or place-neutral policies and PBPs has focused on the degree of optimal concentration of economic activities. According to Barca

New Trends in Regional Policy: Place-Based Component and Structural. . .

1033

et al. (2012), “tapping into unused potential in intermediate and lagging areas is not only detrimental for aggregate growth, but can actually enhance both growth at a local and a national level.” Nonetheless, evidence on the actual impact of placebased policy interventions has remained mixed (cf., for example, the recent metaanalysis provided in Dall’Erba and Fang (2017) for the case of EU Structural Funds). Duranton and Venables (2019) provide a comprehensive discussion on the factors that may justify a place-based approach. According to these authors, they are related to the value of proximity and connectivity; the fact that there is a strong complementarity of the elements determining local or regional performance; and the existence of asymmetric shocks across the regions of the same country. Given the many market failures that may be associated with spatial development, such as excessive concentration of investments or vicious local underdevelopment traps, there may be a role for policy interventions. Focusing on growing regional inequalities and rational for policy intervention, Iammarino et al. (2019) reckon that “both mainstream and heterodox theories have gaps in their ability to explain the existence of these different regional trajectories and the weakness of the convergence processes among them.” Given the limited scope for factor mobility to address regional disparities and spread prosperity, they call for a new development policy, which again is “place-sensitive.” To design PBPs and structure our understanding of economic space, an important consideration lies in identifying the right geographic scale for analysis and policymaking. Focusing the level of analysis at a very small scale (e.g., municipal level) might lead to a very complex and fragmented unit of analysis falling short of labor market dynamics and economics of scale necessary for economic development. Between the local and national scale, there are a wide range of geographies across countries. In order to improve international comparability, the OECD has developed two levels of regional geographies: large regions (Territorial level 2, TL2), which broadly correspond to the first tier of sub-national governments beyond the national scale, and small regions (Territorial level 3, TL3), which stand between TL2 regions and the local level. Small regions in some countries correspond to administrative regions, but largely are statistical regions. Large regions typically include a mixture of metropolitan cities, peri-urban areas, second tier cities, and rural areas, in contract to smaller regions, which tend to be more differentiated. For international comparability, the OECD developed several taxonomies over the course of the years classifying regions with similar characteristics. The first taxonomy developed in 1994, labeled as the OECD regional typology, classifies small regions as predominantly urban, intermediate, or predominantly rural according to a population density threshold across local units and total population in the region living in local units with low density. Over the course of the years, this classification expanded by first sub-classifying rural regions into two sub-groups: rural close to cities and rural remote regions based on a driving distance to urban centre criterion (1 h time distance). The second modification to improve comparability changed the arbitrarily defined local units to grid cells.

1034

J.-E. Garcilazo and J. Oliveira Martins

A second taxonomy focused on defining comparable metropolitan cities and their areas of expansion. This definition labeled as functional urban areas (hereafter, FUAs) classifies cities and their broader area of influence based on commuting patterns. An FUA is constructed by concatenating grid cells with high population density (typically above 1,500 inhabitants per km2) into an urban core. Then, these cells are connected with surrounding lower density cells when the flows of commuting between the two types of cells exceed a given threshold (typically, at least 15% of the labour force commutes to the urban core). The set of the urban core and the hinterland compose the, so-called, functional urban area. The two territorial definitions lead to different analytic frameworks. The TL3 and TL2 regions cover the entire territory within countries, while FUAs only capture a sub-sample of the territory. Furthermore, this typology may lead to a certain dichotomy between urban and rural areas. Against this backdrop, the OECD has recently developed an alternative definition introducing some spatial continuity between metropolitan and non-metropolitan areas. It is based on five types of regions based on the share of population living in metropolitan areas and an accessibility criterion. The five definitions include: (i) large metropolitan regions; (ii) metropolitan regions; (iii) non-metropolitan regions close to large cities; (iv) non-metropolitan regions close to medium size cities; and (v) remote regions. These definitions initially elaborated for international comparability also represent important tools for policymaking purposes. For example, among the most relevant policies for cities are mobility policies, spatial planning, and housing. It is critical to design and coordinate them at the right scale. The FUAs provide a relevant geographic scale to coordinate these three policy areas, as they capture the entire local labor market of cities and their wider area of expansion. The governance system needs to be adapted to this scale, notably through metropolitan governance bodies or other coordinating mechanisms. The TL3 definition is also relevant for rural polices, since it differentiates amongst different types of rural regions – those close to cities and those that are remote. Rural regions close to cities require a much stronger integration of policies with cities in areas such as transportation, land use labor market, or housing, amongst others. In contrast, rural remote regions may require much differentiated policy responses that address their particularities. Spatial scales thus are critical tools for the design of regional policies. The subsequent sections are structured as follows. First, we analyze the trends of tertiarization in developed economies and their relation to the increased dominance of large cities. Over the long run, the economic space has been progressively structured around these large cities, their hinterlands, rural areas close to cities, and the rural remote areas. The chapter then focuses on the appropriate regional policies for each of the regional layers. In the final section, we discuss the evolution of the place-based regional policy paradigm, in particular its recent focus towards wellbeing.

New Trends in Regional Policy: Place-Based Component and Structural. . .

2

1035

Three Main Regional Scales and Their Characteristics

National growth and productivity strategies are usually based on macro-structural or sectoral policies. Such structural policy packages often focus on different markets (product, labor, etc.) aiming at increasing the structural flexibility of economies. In developing and emerging countries, growth strategies may also focus on the role of selected sectors, which could act as engines of productivity and job growth. While these policies can be designed in a piecemeal or a multi-dimensional manner, they all have a common characteristic. They are typically undifferentiated to geographies or spatial scales. They are space-blind. However, policies can be uniform but not spatially neutral. They may produce very asymmetric effects across territories. Think for example of trade liberalization policies. Typically, large cities are dominated by service sectors, which are much less exposed to international competition than agriculture or industries. Therefore, intermediate and rural areas may feel the effect of trade liberalization in a much more acute manner than urban dwellers. At the same time, the capacity to adjust and create new jobs may also be more limited. The perception about the effects of globalization can therefore be dramatically different. The current discussion about the “geography of discontent” can be related to these differences (Los et al. 2017; McCann 2018). At different degrees, depending on their state of development, all countries are affected by these spatial asymmetries. In particular, the excessive population growth observed in large cities of many developing countries is driven by relative development gaps between rural and urban areas. Maintaining a certain level of development in rural areas is a way of managing the flow of migration to large cities. Policy-makers could therefore focus on a national spatial productivity strategy. This strategy has mainly three spatial layers. First layer benefits from agglomeration economies in large and dense urban areas, notably in service sectors. Second layer promotes regional productivity catching-up in regions intermediate/rural close to cities. Proximity and tradable sectors play a key role. Third layer addresses the specific problems of remote rural areas, through place-based approaches (e.g., smart specialization). We will discuss now what policies are more adapted to each of these territorial scales.

2.1

Tertiarization of Economies and Competitive Edge of Large Cities

Globalization during the past decades has reshaped global institutions, economic structures, and societies. The offshoring of manufacturing jobs to emerging economies with cheaper labor costs (e.g., China, India) has boosted the relevance of emerging economies and with it new international institutions have come to the forefront in shaping international policy agendas mainly the G20. A growing body of literature has also documented the emergence of global value chains, in which the

1036

J.-E. Garcilazo and J. Oliveira Martins

production of tradable goods has gradually decoupled from a centralized location into a multiplicity of them. Production tasks have mainly delocalized to emerging economies where labor costs are cheaper, in contrast to capital-intensive ones, including R&D activities and services, remaining in the home base of countries from multinational firms. The delocalization of industrial activity from developed to developing countries has contributed to the tertiarization of economic activities in OECD countries, in which the relative share of service – both as a proportion of total output and employment – has increased during the past decades. Services nowadays currently represent typically 80% or above of value added across OECD countries representing. This structural transformation has not been neutral in space. The largest share of employment in services is found in large metropolitan areas, as showed in Fig. 1. In this and subsequent figures, the TL3 regions are classified into five regional types, as follows (see Annex A): • • • • •

Large metro areas Metro areas Non-metro areas with access to metro areas Non-metro areas with access to a small/medium city Non-metro areas remote

Indeed, manufacturing ceased to be the economic base of large cities against service-oriented activities, which require proximity, a pool of specialized labor, access to capital and knowledge networks, all located in cities especially large ones.

Share of services in total employment (%)

Large metro Metro NM remote

NM with access to a metro NM with access to a small/medium city

85% 80% 75% 70% 65% 60% 55% 50% 45%

2008

2009

2010

2011

2012

2013

2014

2015

2016

Fig. 1 Share of employment in services by type of TL3 region, OECD 2008–2016. Note: Switzerland, Japan, Norway, Australia, and Korea are not included in the calculation because of data availability. (Source: OECD Regional Statistics Database. https://doi.org/10.1787/region-data-en)

New Trends in Regional Policy: Place-Based Component and Structural. . .

PP

Metro

1037

Non-Metro

8 6 4 2 0

-2 -4 -6 -8

Fig. 2 Population change by type of region, 2001–2017. (Source: OECD Regional Statistics Database. https://doi.org/10.1787/region-data-en)

The concomitant gradual tertiarization of OECD economies and increased competition in tradable goods from emerging economies has favored concentration cities against remote and low-density regions. This tertiarization of the economies has accentuated regional disparities (Odendahl et al. 2019). This continued urbanization is illustrated by the population change, which continues to advantage metropolitan areas (Fig. 2). Cities benefit from a higher capacity to transform the tertiarization of economic activities into productivity gains than low-density economies given that productivity of services crucially depends on density levels (Morikawa 2011). Furthermore, cities also enjoy from the benefits from economies of agglomeration. Firms tend to locate close to other firms and to densely populated areas due to lower transportation costs, proximity to markets, and wider availability of labor supply. People are also attracted to densely populated areas for the wider availability of job opportunities, goods, and services. These mutually reinforcing forces yield important economic premium for both consumers and firms through economies of scale, a better matching and functioning of labor markets, spillover effects, and more technological intensity (Duranton and Puga 2004). Productivity and, therefore, wages tend to be higher in densely populated areas. On average, a worker’s wage increases with the size of the city where he/she works, even after controlling for worker attributes such as education level. OECD (2015) estimates suggest that the agglomeration benefit in the form of a wage premium rises by 2–5% for a doubling of population size. These benefits also yield higher productivity levels for the broader metropolitan region. On average, output per capita in large metro regions is almost 30% higher than in metro regions, 36% higher than in non-metro regions close to metro, and 50% higher than in remote regions and non-metro regions close the small and medium (Fig. 3). The main idea behind the New Economic Geography (hereafter, NEG) is to explain why consumers and firms tend to agglomerate together in geographic areas

1038

J.-E. Garcilazo and J. Oliveira Martins

where other firms and consumers are already located. Simply stated, firms like to locate where other firms and or suppliers are located given the lower transportation costs. They also like to locate where consumers and densities are higher especially service oriented firms. Workers in turn also like to locate close to firms, given the higher job opportunities available. Studies of this phenomenon include Perroux’s notion of “growth poles” (Perroux 1955), Myrdal’s analysis of “circular and cumulative causation” (Myrdal 1957), and Hirschman’s concept of “forward and backward linkages” (Hirschman 1958). Demographic patterns across OECD countries over the past 16 years confirm these circular and cumulative causation dynamics. The share of populating living in metro regions increased in all but three (Greece, Belgium, and the United Kingdom) OECD countries, against the share of non-metropolitan regions. The increase in the share of metro regions in average is close to two full percentage points amongst 30 OECD countries considered (Fig. 2). In contrast to this structural trend of tertiarization and the associated benefits brought by cities, low-density economies have faced competitive pressures from low-wage emerging economies in tradable goods over the past years. Low-density regions have thinner and more fragmented internal markets, and thus their productivity gains in services is limited and therefore must export to markets elsewhere to raise their productivity. These forces have consequently led to the creation of new, often higher paid jobs in cities and the destruction of jobs or readjustment of wages in rural areas due to global competition in tradable goods and services. Beyond this structural transformation, many countries have also faced the shock and the aftermath of the global financial crisis during the last decade, which has not been neutral in space. Low-density regions which produce a limited range of the goods and services imply greater vulnerability to sector-specific shocks, whether positive or negative. In a very large, dense economy, the greater range of activities typically offers a greater degree of resilience to external shocks. The effects of the crisis display a negative growth rate in GDP per capita from 2009 to 2015 across all regional categories (e.g., slope in Fig. 3). The negative growth rates are much higher for rural remote regions and non-metro close to medium and small cities than for non-metro close to metro and metro regions. Indeed metro regions and those close to them have weathered the effects of the crisis much better than regions far away from metro areas.

2.2

Rural Regions Close to Cities and Agglomeration Effects

Cities enjoy from the benefits of economies of agglomeration through a productivity premium. The benefits, however, must be weighed against the costs of densely populated areas such as congestion, negative social effects of possible oversupply of labor, higher land and housing prices, rising inequality, and environmental pressures. The net impact varies from one urban area to another. These diseconomies

New Trends in Regional Policy: Place-Based Component and Structural. . . Pre-crisis (2003-2007)

Post-crisis (2008-2016)

GDP pc Growth (%)

1039

2.5

2

1.5

1

Metro Large

0.5 Non-Metro Close…

0 24000

26000

28000

Non-Metro Remote

30000

Non-Metro Close to Metro

32000

34000

Metro

36000

38000

40000

42000

44000

GDP pc, 2003, 2016 (USD)

Fig. 3 Performance in GDP per capita in TL3 regions, 2003–2015. (Source: OECD Regional Statistics Database. https://doi.org/10.1787/region-data-en; Rural Policy 3.0, OECD (forthcoming)) Functional region Economic structure

Spatial structure

Governance structure

Population, human capital, commuting Investments and economic transactions

Urban areas -size -performances -transport infrastructures

Services provision

Environmental goods and amenities Governance interactions - partnerships

Rural areas -size -performance -environment -transport

Physical distance

Fig. 4 Linkages between rural and urban regions. (Source: Rural and Urban Partnerships: an integrated approach to economic development, OECD, Paris)

can generate opportunities for lower density areas close to cities. For example, lower housing prices can be traded-off by higher transportation costs. In rural areas that are closer to cities, the urban and rural spaces can be highly interlinked across economic, social, and environmental dimensions (Fig. 4). Linkages can occur in many dimensions including amongst others in commercial ties,

1040

J.-E. Garcilazo and J. Oliveira Martins

Non-Metro Close to Metro

Non-Metro Close to Small & Medium

Non-Metro Remote

1

GVA pw/GVA pw OECD

0.95

0.9

0.85

0.8

0.75

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

Fig. 5 Labor productivity trends non-remote regions, 2003–2015. (Source: OECD Regional Statistics Database. https://doi.org/10.1787/region-data-en; Rural Policy 3.0, OECD (forthcoming))

environmental goods, and population flows. Rural areas that are in close proximity to cities have much stronger linkages in transportation networks, commuting flows, spatial planning, and the provision of goods and services. Stronger linkages and the associated benefits are also often referred to as “borrowed” agglomeration effects from neighboring cities. For a doubling of the population living – at a given distance – in urban areas within a 300 km radius, the productivity of the city in the centre increases by between 1% and 1.5% (OECD 2015). This may also explain why productivity in US cities generally increases more strongly with city size than in the European countries. Smaller cities in Europe are not that much disadvantaged, as they can simply “borrow” agglomeration from neighboring cities. In addition, they can also offer higher environmental amenities than in cities and better services than in rural remote regions. All these dimensions are important elements for quality of life standards and for attracting skilled labor force. The labor productivity trends across non-metro regions show a clear pattern of convergence in both non-metropolitan regions close to cities to national standards over the past years (2003–2015) against a diverging trend in non-metropolitan remote regions (Fig. 5). The divergence trend present in remote regions can also be driven by their higher vulnerability to the effects of the global financial crisis, as previously described.

2.3

Heterogeneity of Remote Rural Regions and Tradable Activities

Factors associate with economic geography are at the heart of economic development strategies in remote rural places. The geography of a place is effectively

New Trends in Regional Policy: Place-Based Component and Structural. . .

1041

defined by a combination of physical (“first-nature”) and human (“second-nature”) geographies (Ottaviano and Thisse 2004). The more people inhabit a place, the more it is driven by second-nature geography by human beings and their activities, or economies of agglomeration. In contrast when settlement is sparse, first-nature geography inevitably looms larger, which implies a larger role for natural factors in shaping economic opportunities. In other words, rural remote regions do not benefit from economies of agglomeration and cannot develop large service sectors. They do not have any internal market and furthermore must face a physical distance to major markets. This increases travel times and shipping costs, which must be borne by the buyer (in the form of higher prices) or seller (in the form of lower margins). Economic structures of such places however often have very specific features. Production is concentrated in relatively few sectors, since it is impossible to achieve “critical mass” in more than a few activities, and these activities must be exported to external markets in other regions or in other countries given the lack of internal markets. These features suggest that productivity in rural remote crucially depends on tradable activities or alternatively and export-oriented economy. Those areas that yield competitive or absolute advantage are the only sources of productivity gains and sustainable development. Given the higher transportation costs, tradable producers require an edge in terms of efficiency to offset the distance costs. Moreover, the limited scope for pursuing economies of scale in many sectors suggests that producers in the non-resource tradable goods sector need other sources of competitive advantage, e.g., focusing on unique qualities of products, where scarcity can add value. Remote rural regions specialized in tradable activities, particularly in manufacturing, however have faced competitive pressures from developing countries. Globalization has brought increased competition to the tradable sector and especially to manufacturing, which includes pre-manufacturing services, manufacturing itself, and post-manufacturing services. This exposure to low-cost competition is illustrated in Fig. 6. Figure 6 shows the relation between the shares of full-time employment of sectors located in mostly urban areas (an indicator of urbanization) and the average wages (a proxy for labor productivity). It can be seen that most sectors of sophisticated manufacturing and services are mainly located in urban areas, whereas low-cost and low-skilled manufacturing and natural resources are in non-urban (rural) areas. In this context, rural remote regions cannot build their future on low-cost, standardized production; rather they need transition from a standardized to more differentiated, flexible, and customization-oriented types of production. Recent evidence suggests that most value is created in downstream and upstream manufacturing production cycles, implying the importance of product differentiation and innovation to move up in the value chain. This leads to a stepper U-shaped relation between value creation and stages of production compared with traditional manufacturing value chain. This phenomenon was coined by Baldwin and Evenett (2015) as the “smile curve.” In addition to facing competitive pressures, remote rural regions face a narrower economic base, which imply greater vulnerability to sector-specific shocks, whether

1042

J.-E. Garcilazo and J. Oliveira Martins

Average wage in 2010 EUR

70 000

60 000

Biopharmaceuticals Communications Equipment and Services

50 000

Video Production and Distribution Business Services Financial Services

40 000

Insurance Services Marketing, Design, and Publishing

Nonmetal Mining Appliances

30 000

Music and Sound Recording

Non-traded clusters Fishing and Fishing Products Leather and Related Products Agricultural Inputs and Services Footwear Forestry

20 000

Apparel

10 000 0

10

20

30

40

50

60

70

80

90

100

Percentage of FTE in mostly urban areas, %

Fig. 6 The specialization of urban versus urban areas. (Source: Productivity and jobs in a globalised world: (how) can all regions benefit? OECD, Paris; FTE stands for full-time employment)

positive or negative. Indeed the trends in GDP per capita from 2003 to 2015 across the three types of non-metropolitan regions reveal an overall trend of convergence in GDP per capita across the three types of non-metro regions, with a stronger pattern in both non-metropolitan regions close to cities (Fig. 7). The data also show the much tighter distribution in non-metro close the metro regions against a higher dispersion in non-metro close to small and medium regions and remote regions. Still, a number of remote regions have performed with the very high growth rates. In some cases, this is due to specific advantages, such as natural resources, natural or cultural assets, from which an economic basis can develop. In sum, this section depicts the structural transformation towards tertiarization over the medium and long run, which has not been neutral in space favoring cities and densely populated given their ability to transform service activities into higher productivity through economies of agglomeration. Against this backdrop, low-density economies and remote areas, who lack economies of agglomeration, must take the most of tradable activities to raise productivity, and these have faced high competitive pressures from emerging economies with low wages. Furthermore, the shock of the global financial crisis and its aftermath since 2009 has been more severe on low-densely populated areas, with a narrower economic base. Metro regions display higher rates of productivity than non-metro regions and the level of productivity declines with density. Non-metro regions close to metro are converging to national productivity levels against the divergence present in remote regions. Despite this divergence, remote regions show higher variability in their performance with a number of regions performing very well. Non-metro close to the medium and small cities displays higher variability. Growth in these two categories

1043

Non-Metro Regions close to Metropolitan areas

6 5

OECD average

GDP per capita growth 2000-2016 (%)

New Trends in Regional Policy: Place-Based Component and Structural. . .

4 3 2 1 OECD average 0 -1

y = -5E-05x + 2.4637

-2 -3 0

10000

20000

30000

40000

50000

60000

GDP pc USD PPP, 2000

Non-Metro regions close to Small-medium cities OECD average

GDP per capita growth 2000-2016 (%)

10 8 6 4 2 OECD average

0 -2

y = -8E-05x + 3.3953

-4

0

10000

20000

30000

40000

50000

60000

GDP pc USD PPP, 2000

Non-Metro Rural remote regions

OECD average

GDP per capita growth 2000-2016 (%)

8 6 4 2 OECD average

0 y = -3E-05x + 1.8091

-2 -4

0

10000

20000

30000

40000

50000

60000

GDP pc USD PPP, 2000

Fig. 7 GDP per capita in non-metro regions, 2000–2016. (Source: OECD Regional Statistics Database. https://doi.org/10.1787/region-data-en; Rural Policy 3.0, OECD (forthcoming))

1044

J.-E. Garcilazo and J. Oliveira Martins

depends much more on region-specific factors, which randomly varies across the geography.

3

Differentiated Regional and Urban Policy Responses

The previous regional trends convey a number of hints for policy design, in particular how to build a national strategy to raise productivity through a spatial lens. Cities and functional urban areas must try to maximize the benefits of economies of agglomeration, and policy can help by mitigating the costs of diseconomies of scale in three key areas: transportation, housing, and spatial planning. Nonmetro regions close the cities must also try to maximize the benefits of agglomerations by enhancing their linkages with cities, to ensure they can benefit from agglomeration effects while at the same time benefits from their lower diseconomies of scale and higher amenities. In contrast, remote rural areas with no weak or linkages to cities require tailored-made policy responses that can address their needs. In this sense, they require perhaps the most demanding form of place-based policies, which can focus on their unique assets with competitive or absolute advantages. In this regard, the role of decentralization and multi-level governance is crucial. A central government cannot have as many policies as different types of cities and regions! Therefore, designing place-based policies is a too complex task to be fully centralized. However, decentralization also needs to be organized as a partnership and not only as a process of autonomy and devolution of competencies. Decentralization also works better when it is done in a process allowing for the asymmetry of capacities at the local level and experimentation (learning-by-doing). As discussed above, the question of finding the right territorial scale is key. To avoid excessive fragmentation of the policy decision, two important building blocks are the governance of metropolitan areas and mechanisms to promote supra-municipal cooperation.

3.1

Policy Responses in Cities

Economies of agglomeration bring a productivity premium to cities and urban areas due to scale economies, a better matching and functioning of labor markets, spillover effects, and more technological intensity (Duranton and Puga 2004). This can be a self-reinforcing mechanism. Cities, however, also face a number of costs or diseconomies of scale. These include congestion, pollution, increasing social disparities, and environmental degradation. Policy responses can often be more successful in mitigating these costs rather than accelerating the economic benefits of agglomerations, which occur on their own through market-based mechanisms. Moreover, the growing inequalities have come to the forefront of the debate around urban policies. For example, in the USA, 17 among top 25 US metros had Gini coefficients above the US national average. Often, the bigger the city the larger

New Trends in Regional Policy: Place-Based Component and Structural. . .

1045

the inequalities. This is driven by several factors. Skill-biased technical change occurs due to a shift in the production technology that favors skilled over unskilled labor by increasing its relative productivity, demand, and wages. Cities attract highskilled workers in search of high-paid jobs in high-value business services, management, and banking. Cities however also attract low skilled workers looking for employment in low-paid service sector jobs. Skill distributions therefore tend to be more dispersed exacerbating spatial segregation and inequalities. Poorer households tend to cluster in poorer neighborhoods, where housing costs are lower, but these are often areas with poorer amenities and poorer access to education and other public and private services. As cities grow, so does inequality, and this can be sustained over time due to spatial segregation. Policy responses to address inequality must first focus at the right scale of the city beyond the administrative borders. Integrating policies for transportation, housing, and spatial planning across functional urban areas can be an effective tool not only to reduce inequality but also to increase the performance of cities. In this sense, by offsetting the costs, the net benefits of agglomerations are higher. Municipal fragmentation may reduce the economic performance of metropolitan areas. Metropolitan areas typically cross multiple administrative boundaries. For example, the number of local governments is around 1400 in the Paris and 1700 in the Chicago metropolitan area. The analysis of the impact of horizontal fragmentation in metropolitan areas reveals lower productivity levels in cities with fragmented governance structures. For a given population size, a metropolitan area with twice the number of municipalities is associated with around 6% lower productivity. These negative effects on economic efficiency can be mitigated by almost half when a metro governance body at the metropolitan level exists (OECD 2015). Another important area for policies in cities is addressing the pressures of urbanization and built-up area. Cities are expanding more than population growth. At present rates, the world’s urban population is expected to double in 43 years while urban land cover will double in only 19 years (Angel et al. 2011). On average, annual growth rate of urban land cover was twice that of the urban population between 1990 and 2000, and most of the cities studied expanded their built-up area more than 16-fold in the twentieth century (Angel et al. 2011). Policies to address urban sprawl must go beyond mere densification. They must encourage density that functions effectively. This implies a need to consider housing affordability and transport-oriented development in addition to infill and densification. An integrated and coordinate approach is warranted at the scale of functional urban areas. Furthermore, there is a need to align policies that unintentionally may promote sprawl, in areas such as taxes and regulation. For example, property tax rates across the USA are highest in central city locations, and lowest for the more distant areas (e.g., exurbs); they were also lower for single-family homes than multifamily dwellings. In Mexico, the allocation of federal housing subsidies has contributed to excessive sprawl, irregular settlement patterns, and urban development in risk zones. These type of polices unintentionally have pushed people and firms further apart from each other, exactly the contrary that agglomeration is supposed to do.

1046

3.2

J.-E. Garcilazo and J. Oliveira Martins

Policy Responses in Rural Areas Close to Cities

Rural regions close to cities are increasingly interconnected with cities through a broad set of linkages. These interdependences can affect wellbeing and economic performances of both urban and rural areas. Despite these linkages, urban and rural policies tend to be fragmented addressing their own constituencies. This has been driven by the use of dichotomous definitions, which define explicit urban territories and rural territories, with no common ground for policy integration. Many countries have been advancing towards redefining their definitions with different shades of urban and rural linkages. For example, in France, the National Institute of Statistics and Economic Studies (INSEE) has developed an indicator that measures the accessibility of services and amenities that are important to daily life for communities of varying population densities. This indicator distinguishes between densely populated towns, intermediately populated towns, sparsely populated municipalities, and very sparsely populated municipalities. Finland also adopted a classification based on spatial data sets and on seven regional types: inner urban area, outer urban area, peri-urban area, local centers in rural areas, rural areas close to urban areas, rural heartland areas, and sparsely populated rural areas. The framework employs a wide range of variables to capture the diversity of rural places including: population, employment, commuting patterns, construction rates, transport access, and land use data. Collectively these variables are used to construct indicators of economic activity, demographic change, accessibility, intensity of land use, and other attributes for each region. Interdependences between urban and rural areas can be captured by functional or accessibility to cities criterion. Territories that emerge as self-contained spaces of relationships between urban and rural areas do not correspond with administrative regions. Defining areas with strong interdependencies can help better integrate urban and rural policies in policy domains. For a rural area close to a city, a critical goal is to increase mobility while at the same time limit sprawl. On the one hand, if population of the rural areas expands into the urban sphere, it will generate traffic congestion, make it difficult and unsustainable to provide and maintain services and infrastructure, and impose pressures on the environment. On the other hand, it is desirable to increase the connections between urban and rural areas so urban dwellers can benefit from natural and cultural amenities in rural areas and rural dwellers can have access to the urban labor market. Increased mobility will also help strengthen the commercial linkages between the two and broaden the benefits of agglomerations. Key policy domains that require integration across urban and rural regions include transportation, land use, and resource use. Providing locally high quality services in rural areas that are integrated into the adjacent urban capacity is also an important policy area. Rural regions close to cities can benefit from the better available services in urban areas given their lower marginal costs. This requires establishing effective partnerships between rural and urban areas. Many countries are experiencing innovative forms of governing urbanrural relationships through partnerships between urban and rural areas, in order to

New Trends in Regional Policy: Place-Based Component and Structural. . .

1047

deal with several issues of policy beyond the provision of services including economic strategy and spatial planning. These partnerships allow the constraints imposed by local administrative boundaries to be overcome through an integrated approach for economic development. Much attention has been devoted recently at analyzing what governance approaches these partnerships can take and what are the characteristics that make them bringing about benefits to both rural and urban territories. Beyond partnerships, several countries are also advancing in providing integrated urban and rural policy packages. Examples of these include the ITI (Integrated Territorial Investments) and CLLD (Community Led Local Development) by the European Commission across several European countries or the reciprocity contracts (contrats de réciprocité) in France.

3.3

Policy Responses in Remote Rural Regions

Policy responses in rural remote regions must deal with the strong heterogeneity and adapt to the specificities of remote areas. The less densely populated a region is, the more the key determinants of its growth performance tend to be specific to that region. The lack of human settlements and activity necessarily implies a larger role for natural factors, such as the climate or landforms, in shaping economic opportunities. It also calls for the need to specialize in the tradable sectors of the economy, which concerns with goods and services that are not restricted to local markets. The nature of these low-density economies, including their relative proximity to markets, and sectoral composition leads to very strong diversity. As a result, they require a differentiated policy response to make the most of their assets and opportunities activities. A very promising domain for rural regions is the opportunities generated by ICT and digitalization. They may permit connecting some of these remote places for the first time and allow for new forms of service provisions. Rural polices that can embrace the digitalization agenda and better understand the opportunities from new technologies is a promising field for the future. Self-driving cars may drastically increase the convenience of travel. Advanced teleconferencing reduces the need for face-to-face meetings enabling the tradability of services. The distributed manufacturing, 3D printing, drones, and autonomous trucks will make it easier to integrate remote places in supply chains. Rural areas will also benefit strongly from digital technologies for public service delivery.

4

Conclusion: An Evolving Regional Policy Paradigm

Regional policies have been evolving throughout the last decades in line the structural challenges observed across countries. In the 1960s, regional policy occurred under a context of strong economic growth, fiscal expansion, and low

1048

J.-E. Garcilazo and J. Oliveira Martins

unemployment, notably among developed economies. Regional policy was mainly perceived as a tool to balance growth in a period of rapid industrialization. The main objective was wealth re-distribution through financial fiscal transfers by national government and public investments. This vision evolved gradually through the 1970s and 1980, with the emergence of global economic shocks giving rise to pockets of high unemployment within countries. Regional policy expanded from its earlier focus on reducing income disparities to also focusing on altering employment conditions in specific regions with high unemployment. The main tools were fiscal incentives aimed at relocating firms and the creation of public jobs. Overall, the results were disappointingly coming short of reducing disparities and attaining sustainable growth in these target areas despite significant public investment. Many governments also attempted to attract foreign direct investment (FDI) into target regions to boost create employment, but also on the assumption that spillovers would benefit local enterprises, increasing their technological and organizational capacity. Against this background, regional policy evolved during the 1990s and 2000s from its previous top-down, subsidy-based interventions designed to reduce regional disparities into a much broader “family” of policies designed to improve regional competitiveness. The objectives, unit of intervention, strategies, tools, and actors evolved from redistribution to competiveness focusing more on a development strategy covering a wide range of direct and indirect factors targeting endogenous assets rather than exogenous investments and transfers and an emphasis on opportunities rather than disadvantages. The policy shift was influenced by the evolution in the New Economic Geography since Krugman’s seminal paper (Krugman 1991) setting the foundations of why economic activity concentrates in certain geographies. The OECD Territorial Development Policy Committee (currently denominated the Regional Development Policy Committee) has also contributed to shape this new vision. With the eventual maturity of this literature, regional disparities became to be understood as a natural process of economic development. This literature highlighted the presence of permanent sources of divergence across different types of regions. The focus shifted from trying to balance and correct spatial disparities towards stimulating growth potential of each region regardless of its relative position to others. Policies paid special attention to the enabling factors of regional growth including human capital, infrastructure, and innovation. In a sense, this required a differentiated policy response to each region addressing their needs and identifying policy priorities. The availability of data and analysis become a critical tool for diagnosing the priorities and monitoring progress. In order to implement this vision, multilevel governance become a key ingredient in the equation, aligning strategies across levels of government and improving the capacity of local and regional stakeholders. The debate around regional policies was then largely influenced by the shock of the global financial crisis in 2009. National governments, during its aftermath faced tight fiscal budgets and a subsequent period of fiscal consolidation. Each policy domain came under scrutiny and those that survived were asked to deliver more with fewer resources. Making the most out of public investments became a priority. Against these fiscal pressures, OECD countries developed 12 Principles on Effective

New Trends in Regional Policy: Place-Based Component and Structural. . .

1049

Public Investment Across Levels of Government given that around two-thirds of OECD public investment are undertaken at the sub-national level. In addition, regional policy also focused on making the most of its policy through efficiency gains focusing on policy complementarities. These did not necessarily require additional resources but instead gains were possible through a better coordination and integration across different policy interventions occurring in each territory and better integrating regional policies with other sectoral policies, or in other words better adapting and integrating sectoral policies to the needs of regions. Countries also looked for efficiency gains at the municipal scale and this has given rise to a number of municipal mergers (Denmark, Finland, and Netherlands) and metropolitan governance reforms (France). A second consequence of the crisis was the elevation in the importance of regional policy. During the aftermath of the crisis, in several EU countries, the structural funds represented the bulk of public investments during times of austerity. In this sense, regional policy became one of several important measures adopted by countries to recover from the shock of the crisis, elevating the role of region policy as part of the structural policy package. Several OECD Economic Surveys, usually focusing on macro-structural issues, have dedicated for the first time full chapters on the regional dimensions and disparities (Slovak Republic, United Kingdom or Italy). More recently, the IMF World Economic Outlook also devoted a structural chapter to regional disparities (IMF 2019). The effects of the global financial crisis also refocused and broadened the structural policy agenda, from the economics pillar towards wellbeing and including the social and environmental dimension. This meant a rethinking of policy trade-offs and complementarities (OECD 2011). At the countrywide level, this exercise can be quite abstract. In contrast, a more granular approach taking into account the specific conditions in each type of region and city can help understand trade-offs or potential complementarities among the objectives of efficiency, equity, and environmental sustainability in a more concrete manner, as opposed to policies that are “spatially blind” or purely based on sectoral approaches. The growing gap between metro and non-metro areas may have fuelled a “geography of discontent” (McCann 2019), and this has changed the current debate on regional policies. Under the previous paradigm, regional inequalities were viewed as a necessary by-product of development, which called for compensatory public interventions. However, when the disparities become large and persistent, they may threaten the national system. Accordingly, some countries have changed their focus on addressing these larger inequalities including the new policy for responsible development in Poland or Korea’s balanced development approach. The new debates on regional policy are also shaped by the effects of the, so-called, technological, demographic, and environmental megatrends. These trends may create opportunities, but also tensions on different societal objectives (efficiency, equity, and sustainability). Future considerations for place-based polices will be affected by these megatrends, but their effects will vary from region to region, within the same country. Appropriate policy responses need to consider this diversity into account, and will need to be tailored to each regional-specific trade-offs.

1050

5

J.-E. Garcilazo and J. Oliveira Martins

Cross-References

▶ Place-Based Policies: Principles and Developing Country Applications Acknowledgments We would like to thank Juan R. Cuadrado-Roura for helpful comments and suggestions. Disclaimer The views expressed in this paper are those of the authors and do not necessarily reflect those of the OECD and its Member countries.

Annex: A Typology to Classify TL3 Regions In this chapter, we use a method to classify TL3 regions into metropolitan and non-metropolitan areas according to the following criteria: • Large metro TL3 region (MR-L): If more than 50% of its population lives in a functional urban area (FUA) of at least 1.5 million inhabitants. • Metro TL3 region (MR-M): If the TL3 region is not a large metro region and 50% of its population lives in an FUA of at least 250 thousand inhabitants. • Non-metro area with access to a metro TL3 region (NMR-M): If more than 50% of its population lives within a 60-minute drive from a metro (an FUA with more than 250 thousand people); or if the TL3 region contains more than 80% of the area of an FUA of at least 250 thousand inhabitants. • Non-metro area with access to a small/medium city TL3 region (NMR-S): If the TL3 region does not have access to a metro and 50% of its population has access to a small or medium city (an FUA of more than 50 thousand and less than 250 thousand inhabitants) within a 60-minute drive; or if the TL3 region contains more than 80% of the area of a small or medium city. • Non-metro remote TL3 region (NMR-R): If the TL3 region is not classified as NMR-M or NMR-S, i.e., if 50% of its population does not have access to any FUA within a 60-minute drive. For further details, see Fadic et al. (2019).

References Acemoglu D, Dell M (2010) Productivity differences between and within countries. Am Econ J Macroecon 2(1):169–188 Angel S et al (2011) Making room for a planet of cities. Policy focus report, Lincoln Institute of Land Policy, Cambridge, USA Baldwin R, Evenett S (2015) Value creation and trade in the 21st century manufacturing. J Reg Sci 55(1):31–50 Barca F, McCann P, Rodriguez-Pose A (2012) The case for regional development intervention: place-based vs. place-neutral policies. J Reg Sci 52(1):134–152

New Trends in Regional Policy: Place-Based Component and Structural. . .

1051

Dall’Erba S, Fang F (2017) Meta-analysis of the impact of European Union structural funds on regional growth. Reg Stud 51(6):822–832 Duranton G, Puga D (2004) micro-foundations of urban agglomeration economies, Chapter 48. In: Henderson JV, Thisse JF (eds) Handbook of regional and urban economics, vol 4, 1st edn. Elsevier, Amsterdam, pp 2063–2117 Duranton G, Venables A (2019) Place-based policies: principles and developping country applications. In: Handbook of regional science. Elsevier, Amsterdam Fadic M, Garcilazo J, Moreno-Monroy A and Veneri P (2019) Classifying small (TL3) regions based on metropolitan population, low density and remoteness. OECD regional development working papers, 2019/06, OECD, Paris Hirschman A (1958) The strategy of economic development. Ann Am Acad Pol Soc Sci 325 (1):125–126 Iammarino S, Rodriguez-Pose A, Storper M (2019) Regional inequality in Europe: evidence, theory and policy implications. J Econ Geogr 19:273–298 IMF (2019) World economic outlook: global manufacturing downturn, rising trade barriers. International Monetary Fund, Washington, DC Krugman P (1991) Increasing returns and economic geography. J Polit Econ 99(3):483–499 Los B et al (2017) The mismatch between local voting and the local economic consequences of Brexit. Reg Stud 51(5):786–799 McCann P (2018) The trade, geography and regional implications of Brexit. Pap Reg Sci 97:3–8 McCann P (2019) Perceptions of regional inequality and the geography of discontent: insights from the UK. Reg Stud. https://doi.org/10.1080/00343404.2019.1619928 Morikawa M (2011) Economies of density and productivity in service industries: an analysis of personal service industries based on establishment-level data. Rev Econ Stat 93(1):179 Myrdal G (1957) Economic theory and under-developed regions. G. Duckworth, London Odendahl C et al (2019) The big European sort? The diverging fortunes of Europe’s regions, Center for Economic Reform, UK OECD (2011) OECD regional outlook: building resilient regions for stronger economies. OECD Publishing, Paris OECD (2015) The metropolitan century: understanding urbanisation and its consequences. OECD Publishing, Paris Ottaviano G, Thisse J-F (2004) Agglomeration and economic geography, Chapter 58. In: Henderson JV, Thisse JF (eds) Handbook of regional and urban economics, vol 4, 1st edn. Elsevier, Amsterdam, pp 2563–2608 Perroux F (1955) Note sur la notion de pôle de croissance. Économie appliquée 8:307–320

Further Reading (OECD References) OECD (2012) Redifining urban. OECD Publishing, Paris OECD (2013) Rural-urban partnerships: an integrated approach to economic development, OECD rural policy reviews. OECD Publishing, Paris OECD (2018) Productivity and jobs in a globalised world: (how) can all regions benefit? OECD Publishing, Paris OECD (2019a) Making decentralisation work: a guide for policy-makers. OECD Publishing, Paris OECD (2019b) OECD regional outlook: leveraging megatrends for rural and urban areas. OECD Publishing, Paris

European Regional Policy: What Can Be Learned Juan R. Cuadrado-Roura

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 When and Why Was the European Regional Policy Implemented? . . . . . . . . . . . . . . . . . . . . . . 2.1 A Long Period Without an EU Regional Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Factors Driving the Implementation of a European Regional Policy . . . . . . . . . . . . . . . 3 The Bases of the New Regional Policy and Their Implementation Through Multiannual Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 An Important Framework: Decision on the Essential Principles of the New Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Key Aspects of the First Two Multiannual Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 A Balance of the Implementation and Results of the First Phase of the European Regional Policy: 1989–1999 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Changes in European Regional Policy During the 2000–2020 Period . . . . . . . . . . . . . . . . . . . 4.1 The 2000–2006 and 2007–2013 Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 The 2014–2020 Program: Key Aspects and Critical Elements . . . . . . . . . . . . . . . . . . . . . 5 What Can Be Learned? An Evaluation of Achievements, Weaknesses, and Perspectives of the European Regional Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Positive Aspects of the European Regional Policy or Cohesion Policy . . . . . . . . . . . . 5.2 Criticisms and Some Concerning Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Positive Factors for Regional Development and Greater European Regional Policy Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Future Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1055 1056 1057 1057 1060 1061 1061 1064 1067 1068 1071 1073 1073 1075 1079 1082 1083 1084 1084

J. R. Cuadrado-Roura (*) Department of Economics and Business Administration, Faculty of Economics and Business, University of Alcalá, Alcalá de Henares, Madrid, Spain e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_140

1053

1054

J. R. Cuadrado-Roura

Abstract

The need for a common regional policy was not considered in the Treaty of Rome, which brought about the European Economic Community (1957). Two main forces supported the important change in the 1980s and launched the EU Regional Policy (ERP): first, the integration of some countries with GDP pc significantly below the EU average and within-country disparities, such as Ireland, Greece, Portugal, and Spain, and, second, the possible impact on regional disparities of the advances toward a more integrated Union. The arguments supporting the decision to implement regional policy on a European scale are reviewed in Sect. 2. The principles and main aspects of the first two pluriannual programs, as well as a balance of their results, are analyzed in Sect. 3. From the very beginning, two key aspects were underlined: the decision to primarily benefit lagging regions and the belief that regional policies should be designed with a long-term focus. Nevertheless, from 2000 onwards, some additional goals have been introduced which are not clearly compatible. The main one has sought to combine the objectives of increasing economic and social cohesion while also addressing the weaknesses of the European economy, in terms of productivity, innovation, and technological change. This has constituted the basis of the last three programs (Sect. 4), and its main consequence has been that the objective of equality has been relegated to a second plane in favor of objectives related to growth efficiency. Section 5, also important, offers an evaluation of the achievements, weaknesses, and perspectives of the ERP. This allows us to extract a number of significant lessons from the European experience. Some final conclusions close the chapter. Keywords

European Regional Policy · Structural funds · Economic and social cohesion · Concentration · Long-term focus · Results and lessons Abbreviations

CP CEC CSF EAGGF EC ECF ERP ERDF ESF EU EUC NUTS 2

Cohesion policy Commission of the European Community Community support framework European Agricultural Guidance and Guarantee Fund European Community European Cohesion Fund European Regional Policy European Regional Development Fund European Social Fund European Union European Union Commission Nomenclature of territorial units for statistics. Number 2 refers to “regions”; number 1 to “macro-regions”; and number 3 to “counties, provinces, or departments”.

European Regional Policy: What Can Be Learned

SEA TEU

1

1055

Single European Act Treaty of European Union

Introduction

The analysis of the economic growth of a country, or group of countries, and the problems brought about by such growth can undoubtedly be approached from an essentially macroeconomic perspective. In fact, this is a very common approach, as it does not require taking into consideration the spatial or territorial aspects that accompany any growth process. Thus, our attention can be focused on issues such as financing problems, the most desirable sectoral changes, the most suitable endowment of infrastructures, the promotion and introduction of technological developments, and, last but not least, the achievement of an equilibrium between economic growth, price stability, and the evolution of the foreign trade balance. However, reality has taught us some issues which are directly linked to territory. First, the analysis of multiple cases has proven that, for several reasons which do not always coincide, the benefits of economic growth tend not to be uniformly distributed throughout territories. Second, in a free market economy, individuals and businesses often make many of their decisions considering the determinants, or advantages and drawbacks, offered by a territory. And lastly, the unequal behavior of the different regions of a country or group of countries may increase any preexisting differences and may also affect political stability and population’s general welfare. Reality has shown that socioeconomic territorial differences existing in a country at any given moment in time can increase during a growth process. Likewise, such processes can also give rise to hierarchical changes among the most dynamic and lagging regions, or else cause an increasing concentration of economic power in certain areas, while the rest are gradually abandoned. These possibilities have been accepted by some well-known theoretical approaches (from Myrdal (1957), Hirschman (1958), Kaldor (1970), and Dixon and Thirlwall (1975) to Krugman (1991)), although they have generally not been included in the mainstream of economics. Historically, the factors which marked the advantages and disadvantages of a specific area or region have been manifold. Some of them have, of course, a more immovable nature than others. A region’s geographic location is undoubtedly one of those factors, as are its climate, availability of natural resources, orography, and the presence of richer neighboring regions, all of which can favor or hinder the development of certain productive activities. To these factors, which could be considered “natural,” we must often add political and economic decisions taken in the past, including design of rail and road infrastructures, choice of a state capital, and the advantages afforded to certain ports. In addition, it is also important to bear in mind the influence of other noneconomic factors (demographic, institutional, and cultural) which also distinguish some territories and/or urban agglomerations from others and may favor or hinder their development. Trade agreements between countries, particularly those seeking economic and political integration, as in the case of the European Union (EU), may cause an increase of existing economic differences between states, as well as their regions.

1056

J. R. Cuadrado-Roura

It is usually accepted that integration and trade agreements will entail very positive benefits in terms of growth, wellbeing, and stability for citizens of signatory countries. However, this idea often fails to consider the proven fact that the consequences of trade agreements, and more so those seeking greater integration, can, and often do, have a very unequal impact at a territorial scale. This can be gathered from many experiences: the real benefits may tend to be concentrated in certain metropolitan areas and/or regions, while other are overlooked or may even be damaged. A first objective of this chapter is to analyze, in a condensed manner, the implementation and development of a regional policy at EU-wide scale, currently known also as “cohesion policy.” The main objective is to underscore some of the most relevant milestones and changes that this policy has experienced from its inception until the present day, evaluating its positive and negative effects. To this effect, we describe the most relevant features that have characterized this policy, the funds allocated, and the main strategic and practical changes that have taken place so far. Based on this and on the experience accumulated over the last three decades, our attention will be focused on presenting some critical reflections on the lessons that can be learned in terms of the design, implementation, and results of the extensive experience of the European Regional Policy (ERP). The chapter is organized as follows. Section 2 (after this introduction) focuses on explaining why the ERP took so long to be implemented and the arguments – particularly political – used to justify the need for regional policy at a European scale. In Sect. 3 we study the programmatic bases of the new ERP and features of the first multiannual programs (1986–1989; 1989–1993 and 1994–1999) and their results. Section 4 focuses on the last multiannual programs, up to 2020, and the progressive overlap between the objectives of the ERP and the concerns and changes of the so-called Renewed Lisbon Strategy and the 2020 Agenda. Lastly, in Sect. 5 we will aim to highlight the positive and negative aspects of the ERP, summarizing some lessons that can be drawn from the European experience and their future development. The chapter is closed with some final remarks which underscore the main ideas and results presented throughout the text. There is an important observation to be considered. The content of this chapter is the result of reviewing an extensive set of works and analyses on the topic, to which we have added certain direct experiences regarding the implementation of EU regional or cohesion policies. Thus, the references included in the text and the bibliography listed at the end must only be considered as a selection of works that can be used to expand on the issues analyzed herein.

2

When and Why Was the European Regional Policy Implemented?

When we talk about European Regional Policy (ERP), we tend to forget that this was not implemented in 1957, when the Treaties of Rome were signed (January, 25, 1957), far from it, but rather over three decades later. The reasons for this delay and the factors that determined the deep change which took place during the mid-1980s deserve a few comments.

European Regional Policy: What Can Be Learned

2.1

1057

A Long Period Without an EU Regional Policy

The founders and heads of state of the six countries which signed the Treaty of the European Economic Community (EEC) were aware of the risk that benefits generated by the agreement (particularly, liberalization of trade) would not be homogeneously distributed among the six countries and their respective regions. However, regional inequalities and their possible increase or reduction were far from being a significant concern for the Community, and it was decided to leave this kind of issues in the hands of the governments that signed the EEC Treaty. The text paid close attention to the mechanisms and decisions to progress toward a customs union, establishing a common agricultural policy (CAP); achieving freedom of movement for individuals, goods, services, and capital; and developing policies for transport, competition, etc. Only in the preamble, and in a very broad manner, was the problem of interregional inequalities referenced. The policies targeting this objective were considered an internal matter of each Member State, whose actions and aids, according to Article 92 of the EEC Treaty, could not go against the rules and competition or against the common market. The most relevant change in this position took place during the 1986–1988 period, as will be shown in the next sections, although the long intervening period of three decades witnessed certain events that are worth remembering, despite not substantially altering the position adopted in 1957. For example, the 1960s brought the possibility of credit awards for states to carry out some regional actions in their respective countries. In 1968, a Directorate General of Regional Policy (DG XVI) was created in the European Commission, mainly to collect data and monitor regional actions carried out by Member States. In 1973, the so-called Thomson Report raised the importance of regional imbalances existing within the Community, and the risk of increasing economic and social differences within the European Community, and the need to tackle them. Two years later, in 1975, the European Regional Development Fund (ERDF) was approved. Its limited resources of 258 million Ecus were assigned to support specific regional policies developed by Member States. Lastly, in 1979, the European Commission was authorized to use “out of quota” resources (5% of ERDF funds) for projects in regions or areas other than those attended by each country. In summary, we can state that the goal of actions based on the ERDF was not to carry out a European Regional Policy but rather to “collaborate” with the actions that each Member State was developing. Therefore, no ERP can be said to have existed at the time.

2.2

Factors Driving the Implementation of a European Regional Policy

The objective of deploying a European Regional Policy first began to take shape in the mid-1980s. At the time, it was decided that the Community should embrace the need to have certain instruments and resources which allowed it to face the problem of economic differences existing within EU Member States. Both the European

1058

J. R. Cuadrado-Roura

Commission – presided by Jacques Delors – and later the European Council argued in favor of the need to anticipate the possibility that advances made toward greater integration of the then 15 Member States could cause an increase of the existing interregional differences. The European Commission is at the epicenter of the political construction process of the EU; it has legislative powers under the approval of the European Parliament; is the guardian of the Treaties; is broadly responsible for management; and proposes and designs the development of new initiatives, which must, in any case, be approved by the European Council (comprised a representative of each Member State, generally its president or the head of government). In January 1985, Jacques Delors became president of the European Commission and set in motion important initiatives, including the implementation of the new European Regional Policy. At this point, we should pose certain questions related to the decision and seek to provide some answers, given the significant importance of this new policy. There are, at least, three relevant questions we should aim to answer: Which were the key motivations to support the implementation of a European Regional Policy (ERP)? Were these reasons mainly political? Did they have any theoretical foundation?

2.2.1

Essentially Political Arguments and Reasons Linked to the Incorporation of New Countries and the Goal of Accelerating the Integration Process Two basic reasons drove the decision to promote the Community’s own regional policy. The first was, undoubtedly, an increase in the number of Member States, up to 15, and the implications thereof. The integration of Greece (1981), Portugal, and Spain (1986) significantly increased regional disparities within the EEC, and it was almost considered an obligation to correct them, in terms of solidarity, and to ensure greater economic and social cohesion. The second relevant fact was the approval, in 1986, of the Single European Act, to further the integration process. While this decision was considered very important for the future of the EEC, many quarters held the opinion that it could promote disparities between European regions, since the benefits of the inner market would tend to be unequally distributed throughout the Union. These two reasons were clearly decisive for the launch of the new regional policy. Both accept the idea that progress toward greater economic integration would benefit central European regions far more than less developed ones, including some located in the central countries of the EU and, especially, the most peripheral in the south of Europe. Robert Schuman, one of the founding fathers of the EEC, had already pointed out that the construction of the Community should be carried out with tenacity and based on a requirement for solidarity between Member States. Undoubtedly, the principle of solidarity inspired the decision to develop a European Regional Policy which should favor the less developed regions in the Community, employing a significant amount of the European budget’s funds, thanks to the net contributions of richer countries.

European Regional Policy: What Can Be Learned

2.2.2

1059

Economic and Political Arguments and the Lack of a Theoretical Basis Some of the arguments that are commonly used to justify the deployment of any regional development policy include political motivations, historical claims, and others which are usually specific to some countries, as well as economic arguments. From the standpoint of economic policy, the application of regional policy actions is usually justified through the reduction of interregional disparities as part of income redistribution policies. The economic theory and thesis on territorial taxation argue that interregional convergence policies are included among income redistribution policies and that personal redistribution policies are not enough, instead requiring the support of lagging regions’ general development in order to bring their social and economic conditions in line with the rest of the country. This argument is based on the fact that efficiency is not fully guaranteed by the market and its adjustment mechanisms and that the malfunction of adaptation and growth mechanisms in some regions means that their resources are not adequately utilized or even underutilized. The fact that lagging regions show high and persistent unemployment rates and lack of new job creation has very clear economic consequences. The reduction of unemployment in regions with high rates benefits the whole of the country, both through the impact of creating new jobs and their contribution to overall economic production, as well as the effects of reducing subsidies and unemployment aid. Likewise, other arguments in favor cite that developing a lagging territory can provide benefits for the entire country, since the available resources of that region or territory are utilized more efficiently, while at the same time, if the population does not migrate, the abandonment of existing equipment is prevented (homes, developments, infrastructures, etc.). In opposition, regional disparities can entail high costs related to the rapid growth of some urban areas with a high demand for public services and infrastructures, as well as the time required for movements within those urban areas. Another argument is that reducing regional disparities can contribute in some measure to reduce certain inflationary pressures motivated by high salaries and the pressures of demand for products and services which take place in metropolitan areas. Consequently, incentivizing and supporting the development of lagging regions can bring benefits for the economy in general not just in terms of equality but also efficiency and resource assignment. These arguments have been widely used in treaties, reports, and analyses by numerous researchers. As an example, see Biehl (1986) and Armstrong and Taylor (2000). In fact, population movement via migrations driven by poverty and underdevelopment has associated costs that are not valued or discounted by markets. Applying policies aimed at developing lagging regions, it not only justified in terms of equality but could also promote the efficiency of the overall economy, as well as the utilization of resources available in those regions. In addition, there are also environmental and ecological arguments in favor of implementing development policies in areas or regions with low-income levels.

1060

J. R. Cuadrado-Roura

When their inhabitants abandon such areas due to a lack of employment and collective services, or simply in search of better conditions and higher salaries in other regions or metropolitan areas, the local forests are no longer tended to, and farming production also falls. This does not even take into consideration that many of their products and derivatives are able to sustain jobs and open new possibilities for economic development, such as rural tourism and production of artisanal goods that are highly valued by society, derived from traditional farming practices. The documents drafted by the Commission to justify the need for a common regional policy show a clear predominance of political arguments, which have already been addressed, but a very small number of economic arguments such as the aforementioned, which in fact are entirely absent in the documents and reports relating to the implementation of the new ERP. Lastly, another aspect that also did not appear to concern those in charge of designing the new ERP was the consideration of theories related to regional development. The basic document mentions, always superficially, the arguments of the cumulative growth model (from Myrdal and Kaldor to Dixon and Thirlwall, albeit without citing them), as well as the international trade theory (from Heckser-Ohlin). These and other theories of regional development and the causes of interregional disparities are practically ignored. Only several years later did the reports of the European Commission and a good number of works conducted by external teams include the well-known ideas of the export basis theory, millieux innovateurs, the role of innovation in endogenous growth, and the arguments in favor of supporting entrepreneurship and business innovation.

3

The Bases of the New Regional Policy and Their Implementation Through Multiannual Programs

From the previous section, we can deduce that the new ERP was based on arguments of a political nature and directly linked to a concern that progress toward a real single market could generate greater inequality among countries and also at a regional scale, particularly, as a result of the incorporation of several countries with a GDP pc which was notably lower than the European average and including regions which were significantly economically underdeveloped. Once the decision was adopted, the focus turned to “instrumenting” the policy and providing it with sufficient financial resources. The latter aspect was put into effect when the European Council approved the decision (March 1988) to substantially increase the budget allocated for driving the Community’s own regional policy. This included having to design an operational system for that “new” regional policy and its monitoring, as well as the regulations of the structural funds, which would play a key role and were to operate in a coordinated manner. Such “structural funds” were the ERDF (European Regional Development Fund); the ESF (European Social Fund); and the guidance section of the EAGGF (European Agricultural Guidance and Guarantee Fund), also known as FEOGA-o.

European Regional Policy: What Can Be Learned

3.1

1061

An Important Framework: Decision on the Essential Principles of the New Policy

Four basic principles guided the new policy: • Concentration. In the assignment of resources, the new regional policy was to give clear preference in the assignment of resources to the most backward regions (the ones having 0 measures the desirability of the manufactured good with respect to the numéraire, whereas γ (0 < γ < β) is a measure of the degree of substitution between varieties. Moreover, (1) shows that the marginal utility of variety i decreases with its own consumption as well as with the total consumption of the manufactured good. Preferences over all varieties are obtained by nesting the subutility (1) into a linear utility:   U q0 , qin ¼ α

Z 0

n

qi di 

β 2

Z 0

n

q2i di 

γ 2n

Z

Z

n

qi 0

n

 qj dj di þ q0 ,

(2)

0

where q0 is the quantity of the numéraire. The second term in (2) has the nature of a Herfindahl index, thus meaning that β > 0 amounts to assuming that the consumer have a love for variety. Indeed, when an individual consumes nq units of the manufactured such that the consumption is uniform and equal to nq/x on [0, x] and zero on [x, n], she enjoys the utility level Z

Z Z x Z x  2 nq β x nq2 γ nq U ¼α di  di  didj þ q0 x 2 x 2n x 0 0 0 0 β γ ¼ αnq  n2 q2  n2 q2 þ q0 : 2x 4n x

This function is increasing in x and, hence, maximized at x = n where variety consumption is maximal. Consumers endowed with preferences render large markets more attractive as long as locally produced varieties are cheaper than imported varieties. The lot size being fixed, there is no need to specify how housing space enters into preferences. In what follows, we ease the burden of notation by adopting two normalizations that entail no loss of generality: The unit of the numéraire is chosen for α = 1 and the unit of the manufactured good for γ =1 to hold. Each consumer supplies inelastically one unit of labor. Consumers commute to the CBD where jobs are located and earn an income wr, which is determined at the equilibrium. The unit commuting cost is given by θ > 0, and thus a consumer located at x > 0 bears a commuting cost equal to θx units of the numéraire. In addition, each consumer is endowed with q0 > 0 units of the numéraire, which is sufficiently large

New Economic Geography and the City

1187

for the individual consumption of the homogeneous good to be strictly positive at the equilibrium outcome. It remains to close the model by specifying the structure of land ownership. Since we use quasi-linear preferences, the demand for the manufactured good does not depend on consumers’ incomes. Therefore, the way the land rent is redistributed affects only the consumption of the numéraire. In what follows, we follow the main approach used in urban economics and assume than land is collectively owned, so that the aggregate land rent is evenly distributed across consumers. The budget constraint faced by a consumer living in city r is given by Z

n

qir pir di þ q0 þ Rr ðxÞ=δ þ θx ¼ wr þ q0 ,

(3)

0

where pir (qir) is the price (consumption) of variety i in city r, Rr (x) the land rent at x so that Rr (x)/δ is the price paid by a consumer to reside at x. A consumer chooses her location and consumption bundle so as to maximize her utility (2) subject to the budget constraint (3). This yields the following individual demand for variety i in city r (Tabuchi and Thisse 2006): 1 p 1 1 qir ¼  ir þ p with pr  β þ 1 β β ð β þ 1Þ r n

Z

n

pkr dk,

(4)

0

where pr is the average price of all varieties in city r. The demand (4) captures in the very simple way the impact of competition on a firm’s demand: A higher (lower) average price shifts upward (downward) the demand for variety i because local competition is softer (tougher), thus making variety i more (less) attractive to city r-consumers. Furthermore, because each firm is negligible, the price pir chosen by firm i has no impact on the average price pr . The gains from using (2) are reaped in the various issues that can be addressed by using our baseline model. These issues, which are hard to handle under CES preferences, are discussed below.

2.1.2 Land Market and Urban Costs We call urban costs the sum of housing and commuting costs borne by a city r-consumer residing at any location x: UC r ðxÞ ¼ Rr ðxÞ=δ þ θx: Let Ψr (x) be the highest price a worker is willing to pay to reside at location x in city r. Since there is only one type of labor, the equilibrium land rent is equal to Rr ðxÞ ¼ δ max fΨr ðxÞ,0g. A longer commute is equal to the decrease in the bid rent Ψr because @Ψr/@x + θ = 0. Hence, Ψr(x) = k  θx where k is a constant to be determined. Since the land rent at xr is equal to the opportunity cost of land, here zero, we have k ¼ θ xr , and thus Ψr(x) = θλrL/2δ  θx. Therefore, the price paid to reside at x is the mirror image of the corresponding commuting costs:

1188

C. Gaigné and J.-F. Thisse

  Rr ðxÞ λr L ¼θ  x for x < xr : δ 2δ Consequently, the price paid by a consumer to live at x > 0 decreases with the height of buildings because the average commuting is shorter. The urban costs borne by consumers in city r do not depend their residential location within this city and are equal to UC r ðλr Þ ¼

θ λr L : δ 2

(5)

Since they increase with city size, urban costs act as the dispersion force. As expected, inter-city differences in urban costs increase with θ and decrease with δ because more consumers can share the cost of residing at a given distance x to the CBD.

2.1.3 Producers Firms produce a differentiated and tradable good under monopolistic competition and increasing returns; for simplicity, they do not use land. A firm produces a single variety and any two firms supply two different varieties. Producing a variety of the manufactured good requires a zero marginal requirement and a positive fixed requirement ϕ of labor units. Hence, the total mass of varieties supplied in the economy is given by n = L/ϕ and the mass of firms producing in city r by nr = λrL/ϕ, so that λr is also the share of firms located in city r. The value of ϕ may be interpreted as an inverse measure of labor productivity. Markets are segmented, that is, each firm is able to set a price specific to the market in which its output is sold. Since preferences and technologies are symmetric, firms sell their varieties at the same price in each city. Thus, we may disregard the index i and write the profits earned by a city r-firm as follows (s 6¼ r): π r ¼ prr qrr ðprr Þλr L þ ðpr s  τÞqr s ðpr s Þλs L  ϕwr ,

(6)

where prr is the price set by the local firms and prs the consumer price charged in city s by the firms located in r 6¼ s; qrr and qrs are the quantities sold by the r-firms in cities r and s, respectively. The average price in city r is given by pr ¼ λr prr þ λs psr. Plugging this expression into (4) and solving the first-order conditions yield the equilibrium prices: prr ¼

2β þ τð1  λr Þ 2ð2β þ 1Þ

τ pr s ¼ ps s þ : 2

(7)

Observe that prr depends on λr and τ because the share of local firms matter while transport costs affect prr through the consumer price charged by the foreign firms. Both prices prr and pr s capture the pro-competitive effects associated with a higher

New Economic Geography and the City

1189

share of local competitors and lower transport costs. In particular, the prices of local varieties are lower in the larger city than in the smaller because competition is tougher. This result runs against the conventional wisdom that tradables are more expensive in larger cities because land rents and wages are higher therein. This argument overlooks the following two effects: competition is tougher in the larger city, while the quality improvements of existing goods and the introduction of new goods allow consumers to substitute low-priced goods for highpriced goods. Controlling for these effects, Handbury and Weinstein (2015) use a dataset covering 10–20 million purchases of grocery items and find that prices for the same goods are equal or lower in larger cities. This highlights a trade-off that has been neglected in the urban economics literature: Consumers bear higher urban costs in larger cities, but tradable goods may be available at lower prices. Furthermore, (7) also shows that lowering transport costs exacerbates competition in each city though the consumer price of imported varieties is higher than domestic varieties’ because distant firms have to cover the cost of shipping their output. Therefore, consumption is biased toward locally produced goods. By contrast, the producer price of imported varieties is smaller than that of local varieties. The r-firms choose a pass-through smaller than 1 to facilitate the penetration of their varieties in city s. In other words, there is spatial price discrimination. Finally, for inter-city trade to occur and its pro-competitive effects to become concrete, transport costs cannot be too high: prs  τ > 0 . This condition holds regardless of the spatial distribution of firms if and only if τ < τtrade  2β/ (2β + 1) < 1.

2.1.4 Labor Market and Wages Urban labor markets are local while labor market clearing implies that the creation and destruction of firms is governed by the location of consumers. Specifically, the equilibrium wage is determined by a bidding process in which potential firms compete for workers by offering them higher and higher wages until no firm can profitably enter the market. Put simply, operating profits are completely absorbed   by the wage bill.  The equilibrium quantities sold are given by qrr ¼ prr =β and   qrs ¼ prs  τ =β . Plugging the equilibrium prices and quantities into π r and solving for wr give the equilibrium wage in city r, namely wr ¼ Πr =ϕ where Πr are the operating profits made in r. The impact of λr on wr is a priori unclear. Indeed, as λr rises, city r’s size gets bigger, but the local firms’ markup is smaller. By contrast, the size of city s gets smaller while the foreign firms’ markup is higher. However, it is readily check that the equilibrium wage is higher when firms and workers are agglomerated in city r rather than equally dispersed between the two cities. This wage premium is caused here by market size, not by agglomeration economies. Note that the consumer surplus in A evaluated at the equilibrium is also higher under agglomeration than dispersion because all firms are located in A, while urban costs take their highest value under agglomeration.

1190

2.2

C. Gaigné and J.-F. Thisse

The Formation of Manufacturing Cities

The size of the product and labor markets is endogenous when consumers are mobile. Indeed, when consumers move from one city to the other, they bring with them both their production and consumption capacities. As a consequence, the numbers of consumers and workers change in the two cities. The locational choice made by a consumer is driven by the indirect utility associated with (2): V r ðλr Þ ¼ CS r þ wr  UC r þ q0 ,

(8)

where CSr is the consumer surplus and q0 the equilibrium consumption of the numéraire in city r. Hence, when choosing the city where she lives a consumer takes into account the income she earns, the level of urban costs she bears, and the consumer surplus she enjoys in the city through the supply of local and imported varieties. Thus, though the individual demands (4) are unaffected by income, the migration decision takes income into account. Everything else equal, workers are pulled by the higher wage region. The population becoming larger, the local demand for the manufactured good is raised, which attracts additional firms. Although the present framework differs from Krugman’s (1991), it captures the same effects. It also encapsulates the following fundamental trade-off, which is absent in Krugman: Concentrating people and firms in a small number of large cities minimizes the cost of shipping commodities among urban areas, but makes worktrips (as well as many other within-city trips) longer; when dispersion prevails, consumers bear lower commuting costs, but goods are more expensive because each city produces a smaller number of varieties. At the equilibrium, each worker maximizes her utility subject to her budget constraint, each firm maximizes its profits, and markets clear. In particular, no consumer/firm has an incentive to change place. Denoting by λ  λA the endogenous population in city A, a spatial equilibrium arises at 1/2  λ < 1 when the utility differential △V(λ)  VA(λ)  VB(λ) is equal to 0. When △V(1)  0, λ = 1, and thus all consumers and firms are set up in city A. NEG models typically display several spatial equilibria. In such a context, it is convenient to use stability as a selection device since an unstable equilibrium is unlikely to happen. An interior equilibrium is stable if, for any marginal deviation away from the equilibrium, the incentive system provided by the market brings the distribution of consumers back to the original one. This is so if and only if the slope of the utility differential △V is strictly negative at λ . By contrast, an agglomerated equilibrium is stable whenever it exists. Replacing each term of Vr by its expression leads to the following utility differential: △V ðλÞ ¼ L

   θ ΥðτÞ 1  λ , δ ϕ 2

(9)

New Economic Geography and the City

1191

where ΥðτÞ 

  τ 4βð3β þ 2Þ  6β2 þ 6β þ 1 τ 2βð2β þ 1Þ2

,

which is positive because τ is smaller than τtrade. It follows immediately from (9) that λ = 1/2 is always a spatial equilibrium. This equilibrium is stable when θ exceeds δΥ(τ)/ϕ. Otherwise, the manufacturing sector is concentrated into a single city. As a result, when commuting costs steadily decrease, there is a transition from dispersion to agglomeration. The intuition behind this result is straightforward. When θ is large, urban costs are sufficiently high to prevent the emergence of a big city. By contrast, there is agglomeration when θ is small because the gains from labor income and from more local varieties overcome the land market crowding effect. Note also that a high population density, a high labor productivity, or both makes agglomeration more likely because a larger city allows individuals to consume a wider range of varieties priced at a lower level. Though very simple, the above model allows understanding the role played by commuting costs in shaping the space-economy. Consumers, who have a love for variety, are attracted by the city supplying the wider range of local varieties, which are cheaper than the imported varieties. By moving to this city, consumers increase the size of the local market, which makes local competition tougher. However, migration flows crowd out the land market and raise the urban costs borne by consumers residing in this city. Eventually, market clearing and labor mobility balance these various forces and select a spatial pattern involving either two small cities or one large city. Note the difference with Krugman (1991) and the similarity with Helpman (1998): Low transport costs are associated with the dispersion of activities. Indeed, when τ is very small, we have Υ(τ)  0, which implies δΥ(τ)  θϕ < 0. Consequently, firms and consumers are located in two small cities. This is because consumers have more or less the same access to the whole range of varieties, but obviate paying high urban costs through dispersion. This means that lowering transport costs induces the (partial) de-industrialization of large manufacturing cities and the relocation of manufactures in smaller cities or even in rural areas. On the contrary, when τ is large and smaller than τtrade, Υ(τ) takes on its largest value so that δΥ(τ)/ϕ is more likely to exceed θ. Indeed, when transport costs are high, the agglomeration of the manufacturing sector allows consumers to have direct access to all varieties at a low price while firms are able to better exploit scale economies. In other words, high transport costs are likely to be associated with the agglomeration of activities. To sum up, a drop in the cost of shipping commodities fosters the spatial decentralization of jobs and production: Krugman’s prediction is thus reversed. The intuition for this difference in results is easy to grasp. In the above model, urban costs rise when consumers join the larger city, which strengthens the dispersion force. Simultaneously, lowering transport costs facilitates inter-city trade.

1192

C. Gaigné and J.-F. Thisse

Combining the two forces tells us why agglomeration or dispersion arises. By contrast, in the core-periphery model developed by Krugman (1991), the spatial concentration of workers does not generate any cost in the core. Furthermore, the dispersion force stems from the presence of immobile farmers who live in the two regions. This force gets weaker when farmers can be supplied at a lower cost. Consequently, manufacturing firms choose to locate in the same region to benefit from a larger market.

2.3

Extensions

2.3.1 Agents’ Heterogeneity So far, agents have been assumed to be identical. However, it is well known that both firms and workers are heterogeneous in productivity. Accounting for heterogeneous agents gives rise to a whole range of new issues (Behrens and Robert-Nicoud 2015). In what follows, we briefly discuss the well-documented existence of a wage premium: Wages are higher in larger cities than in smaller cities (Combes and Gobillon 2015). We have seen that the agglomeration of workers leads firms to pay a higher wage to compensate workers for their higher urban costs. However, the wage premium may also occur because more talented and skilled workers sort into large cities, because large cities select the most productive firms, or both. (i) We extend the framework developed in Sect. 2.1 by assuming that the economy involves skill-heterogeneous workers. A s-worker supplies s efficiency units of labor. Since the wage per efficiency unit of labor is given by Πr/ϕ, the wage earned by a s-worker living in r is equal to wr ðsÞ ¼ sΠr =ϕ. It is then readily verified that @ 2Vr/@s@λr > 0, which means that the single-crossing property holds. In this case, the incentive to live in the larger city is stronger for the highskilled workers. Put differently, workers sort themselves across cities according to their efficiency. When the space-economy involves a large and a small city, the high-skilled workers live in the large city because their earnings are sufficiently high to cover the higher urban costs. By contrast, the low-skilled workers will choose the small city where they earn lower wages, but bear lower urban costs. Under these circumstances, big cities (or large urban regions) are more productive than small cities because they house workers endowed with higher skills. The spatial concentration of skilled workers enhances the productivity of the economy at the expense of growing regional disparities. In other words, the unequal spatial distribution of human capital is one of the main causes of regional disparities. Such a partition of the population may lead the those who are left behind to take a revenge via a wave of political populism that has strong territorial foundations. (ii) How do firms spatially sort themselves according to their total factor productivity? The intuitive argument is easy to grasp. The less efficient firms seek protection against competition with the more efficient firms by locating in peripheral areas where competition is softer. By contrast, the more efficient

New Economic Geography and the City

1193

firms choose to set up in the core because their higher productivity allows them to have a direct access to the larger market where they compete all together by setting lower prices but selling more. In other words, the selection of firms along the productivity line widens the gap between the core and the periphery. However, when the spatial separation of markets ceases to be a sufficient protection against competition from the low-cost firms, high-cost firms also choose to set up in the larger market where they have a better access to a bigger pool of consumers. This leads to the following prediction: When transport costs keep decreasing, interregional productivity differences first grow and then shrink.

2.3.2 The Impact of Local Policies The spatial distribution of economic activities is also driven by regional advantages reflecting regional differences in technological capabilities or institutional quality. (i) Commuting costs may be viewed as a proxy for the quality of intra-regional transport infrastructures. More generally, a lower of value of θ may be considered as an index whose value captures the quality of urban/regional governance through the various public services that enhance the quality of urban life. It is, therefore, worth looking at what the space-economy becomes when θA and θB are different. Assume that θA < θB. Everything else being equal, this implies that urban costs are lower in A than in B, thus allowing city A to host a larger population than B. Thus, when transport costs are low, the dispersed configuration now involves a larger city in A and a smaller one in B. However, this size difference does not imply spatial inequality because the bigger size of city A is associated with higher housing costs and longer commutes than in city B. Market mechanisms can make the city with the less efficient infrastructures the core of economy. Assume, indeed, that θA = θ and θB = θ + Θ with Θ > 0. The utility differential becomes △Vθ(λ) = △V(λ)  Θ(1  λ)/2. In this case, city A attracts all activities for the range of transport cost values given in Sect. 2.1 since △Vθ(1) = △V(1). However, agglomeration may also occur in city B if △Vθ(0) < 0 or, equivalently, θ/δ < Υ(τ)/ϕ  Θ. Therefore, for a narrower range of transport cost values, the less efficient city may become the core of the global economy. Two comments are in order. First, agglomeration can emerge everywhere, including areas with geographical disadvantage, through market mechanisms alone. Second, when θ/δ < Υ(τ)/ϕ  Θ, there are two equilibria involving agglomeration, either in city A or in city B even though the former equilibrium Pareto-dominates the latter. In other words, the equilibrium spatial distribution need not coincide with the first best distribution. (ii) It is well known that high housing costs limit the growth of large and efficient cities (Albouy and Ehrlich 2018). The “artificial scarcity” of land that stems from restrictive land use regulation, the provision of open spaces, or public policies that maintain the prices of agricultural products far above the international level, hurts new residents by reducing their welfare level or motivates a fraction of the city population to migrate away. This idea can be captured in a

1194

C. Gaigné and J.-F. Thisse

simple way by assuming that the opportunity cost of land R0 > 0 is higher in, say, region A than in B. As a result, the land rents in the two cities are now given by the following expressions: RA ðxÞ ¼ θ



 λA L  δx þ R0 2

RB ðxÞ ¼ θ

  λB L  δx : 2

The indirect utility differential implies that city B is bigger than city A at the dispersed configuration. If, for whatever reason, the total factor productivity of city A is higher than city B’s, making land more expensive in A prevents this city from exploiting its full potential through more concentration of activities. Indeed, some workers choose to stay put in the less efficient city where urban costs are lower. This illustrates how land use regulations, which often benefit the owners of existing plots and buildings, may restrict the growth of productive cities. The above two examples illustrate the following key point: Local policies may have perverse effects on the spatial organization of the economy.

2.3.3 Technological Progress in Manufacturing The bulk of NEG has focused on decreasing transport/trade costs. There is no doubt that the transport sector has benefited from huge productivity gains over the last two centuries. However, numerous other sectors have also experienced spectacular productivity gains. For this reason, Tabuchi et al. (2018) have revisited the coreperiphery model by focusing on technological progress in the manufacturing sector. In addition, these authors assume that workers are imperfectly mobile. In other words, migration costs act here as the main dispersion force. In the core-periphery model, prices and nominal wages converge when transport costs decrease. Consequently, the real wage gap, hence the incentive to move, shrinks. When one region is slightly bigger than the other, technological progress in manufacturing reduces the labor marginal (respectively, fixed) requirement in the two regions, making the larger region more attractive by increasing wages and decreasing the prices of existing varieties (respectively, by increasing wages and the number of varieties) therein. Therefore, workers move to the larger region when the wider utility differential exceeds these workers’ mobility costs. We may thus safely conclude that technological progress tends to exacerbate the difference between the two regions. As a result, technological progress in manufacturing also favors agglomeration. Like in the foregoing, urban costs slow down the spatial concentration of the manufacturing sector. Since innovations often require skilled workers who are typically more mobile than unskilled workers, the core region is likely to host the most skilled workers. This sheds light on the following important question: Is interregional labor mobility the right solution to lessen regional disparities? Not necessarily. The standard pro-migration argument holds that interregional labor movement helps to limit the impacts of economic shocks and reduce regional disparities and structural unemployment in the long run. There is no doubt that these forces are at work in the space-

New Economic Geography and the City

1195

economy. However, the pro-migration argument disregards the fact that labor is probably the most differentiated production factor: Migrants are often the top-talent and most entrepreneurial individuals of their region. When the lagging region loses its best workers, those who are left behind are likely to be worse off. The redistributional consequences of mobility are, therefore, very different from what the standard model predicts.

2.3.4 The Bell-Shaped Curve of Spatial Development Historically, it is well known that both transport and commuting costs have fallen at an unprecedented pace. Therefore, what matters is the relative evolution of these two types of spatial costs. For a long time, high transport costs have been the main impediment to trade. Nowadays, within developed countries, the cost of shipping commodities has reached a level that is probably lower than commuting costs, which remain relatively high. As a consequence, the main dispersion force no longer lies in the cost of supplying distant markets, but in the level of urban costs. Under these circumstances, one may speculate that, though economic integration would have initially fostered a more intensive agglomeration of economic activities, its continuation is liable to generate a redeployment of activities that could lead to a kind of geographical evening-out. Medium-sized and small cities are both necessary and inevitable, as firms with greater relative space requirements and lower relative human capital requirements seek out the most cost-effective locations. When they combine cheaper labor and land with a business-friendly local culture, lagging regions have a cost advantage that offsets their lack of agglomeration effects. Hence one may expect the process of spatial development to unfold according to a bell-shaped curve. To be precise, the relationship between economic integration and spatial inequality is not monotone: While the first stages of economic integration exacerbate regional disparities, additional integration starts undoing them (for a more detailed discussion of the bell-shaped curve, see Fujita and Thisse 2013).

2.4

The Decentralization of Jobs Within Cities

As seen above, globalization could well challenge the supremacy of large cities, the reason being that the escalation of urban costs would shift employment from large monocentric cities to small cities where these costs are lower. However, this argument relies on the assumption that cities have a monocentric morphology. The main point we wish to stress here is that decentralizing the production of goods in secondary business districts (SBD) may allow large cities to retain a high share of firms and jobs. Under these circumstances, firms are able to pay lower wages while retaining most of the benefits generated by large urban agglomerations. As they enjoy living on larger plots and/or move along with firms, consumers may also want to live in suburbia. Consequently, the creation of subcenters within a city, that is, the formation of a polycentric city, appears to be a natural way to alleviate the burden of urban costs.

1196

C. Gaigné and J.-F. Thisse

For the redeployment of activities in a polycentric pattern to happen, firms set up in SBDs must be able to maintain a good access to the main urban center, which requires low communication costs. For example, a large share of the business services consumed by US firms located in suburbia are supplied in city centers. By focusing on urban and communication costs, we recognize that both agglomeration and dispersion may take two quite different forms because they are now compounded by the centralization or decentralization of activities within the same city. Such a distinction is crucial for understanding the interactions between cities and trade.

2.4.1 Polycentric Cities We extend the above model by allowing manufacturing firms to locate in the CBD or to form a SBD on each side of the CBD. Both the CBD and the SBDs are surrounded by residential areas occupied by consumers. Because the higher-order services are still provided in the CBD, firms established in a SBD must incur a communication cost K > 0 so that the profit of a firm located in a SBD is given byπ r  K whereas π r is the profit of a firm established in CBD. In what follows, the superscript C is used to describe variables related to the CBD, whereas S describes the variables associated with a SBD. Denote by yr the right endpoint of the area formed by residents working in the CBD and by zr the right endpoint of the residential area on the right-hand side of the SBD, which is also the outer limit of city r. Let xSr be the center of the SBD in city r. It is easy to show that these points are given by yr ¼

σ r λr L 2δ

xSr ¼

ð1 þ σ r Þλr L , 4δ

(10)

where σ r < 1 is the share of jobs located in the city r-CBD. Individuals choose their workplace (CBD location,  or SBD) and their  residential  given the land rents and wages in the CBD wCr and the SBD wSr . The wage wedge between the CBD and a SBD is given by   θ 3σ r  1 λr L, wCr  wSr ¼ θ 2yr  xSr ¼ δ 4

(11)

where we have used the expressions for yr and xSr given in (10). In other words, the difference in the wages paid in the CBD and in the SBD compensates exactly the worker for the difference in the corresponding commuting costs. Moreover, the wage wedge is positive as long as σ r > 1/3, that is, the size of the CBD exceeds the size of each SBD. Note also that a larger population in city r raises the wage wedge. Indeed, as the average commuting cost rises, firms located in the CBD must pay a higher wage to their workers. At each workplace (CBD or SBD), the equilibrium wages are determined by a bidding process in which firms compete for workers by offering them higher wages until no firm earns positive profits. Hence, the equilibrium wages are related through

New Economic Geography and the City

1197

 S  C S the following expressions: wC r ¼ wr and wr ¼ wr  K. Substituting wr and wr into (11) and solving with respect to σ r yields the equilibrium relative size of the CBD in city r:

σ r ¼ min



1 4δK þ ,1 , 3 3θϕλr L

  which always exceeds 1/3. Clearly, the city is polycentric σ r < 1 if and only if K
2(1 + p)/ϕ, the symmetric equilibrium is stable. This is because the gains from variety do not compensate consumers for the higher urban costs they would bear in the large city. In this case, instead of seeking variety, consumers aim to reduce urban costs. The population is thus equally dispersed. To sum up, when commuting costs steadily decrease a service economy shifts from dispersion to agglomeration because the latter allows individuals to consume all services and to earn higher wages. In other words, urbanization may arise in the absence of industrialization. The existence of large consumption cities in several developing countries is evidence that urbanization may arise in the absence of manufacturing sectors.

3.2

Extensions

3.2.1 Spatial Externalities Cities generate costs and benefits that arise from non-markets interactions (Duranton and Puga 2004). We focus here on agglomeration economies that affect positively the productivity of firms located in the same city because spillovers are typically subject to strong distance-decay effects (Carlino and Kerr 2015). More precisely, we assume that ϕr is such that 1=ϕr ¼ Ler =ϕ where e > 0 is the elasticity of agglomeration economies with respect to population size. Likewise, we consider the negative externality associated with traffic congestion by assuming that the unit commuting cost is given by θr ¼ θLκr with κ > 0. The objective is to study the implications of these externalities for the inter-city distribution of activities.

1200

C. Gaigné and J.-F. Thisse

The labor productivity of city r is such that nr qr =Lr ¼ Ler p=ϕ, the equilibrium 1þκ wage is wr ¼ pL1þe r =ϕ, while urban costs are equal to θLr =2. Plugging these new terms in Vr leads to the following utility differential:  L θ ΔV ðλÞ ¼  fðλLÞκ λ  ½ð1  λÞLκ ð1  λÞg 2 δ  2ð1 þ pÞ e e ð λL Þ  λ  ½ ð 1  λ ÞL  ð 1  λ Þ f g : ϕ Agglomeration is a stable equilibrium if and only if ΔV(1) > 0. This condition amounts to θ < 2δ

1 þ p eκ L  θA : ϕ

On the other hand, when θ > θD  2δ

  e þ 1 1 þ p L eκ κþ1 ϕ 2

holds, the symmetric equilibrium is stable because the slope of ΔV(λ) is negative at λ = 1/2. Note that θA < θD when agglomeration economies are stronger than the congestion effect (e > κ). In this case, a larger city and a smaller city emerge in equilibrium when θA < θ < θD. In other words, spatial externalities foster the emergence of an urban hierarchy. By contrast, when the congestion externalities are strong enough (κ > e), we have θD < θ < θA so that both agglomeration (λ = 1) and dispersion (λ = 1/2) are stable equilibria. In sum, spatial externalities that percolate within cities affect the distribution of activities between cities.

3.2.2 Exogenous Amenities Natural and historical amenities are typical examples of untradable goods that consumers enjoy. The literature on migration shows that amenities play a significant role in the location of consumers. Since consumers prefer more amenities than less, it is reasonable to consider a preference structure similar to the one used in models of vertical product differentiation, that is, U ar ¼ ar U ðq0 , qi Þ where ar is the amenity level in city r. In this case, the indirect utility function is given by V ar ¼ ar V r ðλr Þ. In what follows, we consider two opposite, but relevant, cases. In the former, consumers have different tastes while the amenity level is the same in the two cities. In the latter, consumers have different incomes while the amenity level varies across city locations. (i) Assume that the amenity level a is the same in both cities. Since consumers are heterogeneous in their perception of cities, the matching value between a

New Economic Geography and the City

1201

consumer and city r is described as the realization of a random variable er, where the variables er are i.i.d. across consumers according to the Gumbel distribution with zero mean and a variance equal to π 2v2/6. A larger value of ν means that consumers are more heterogeneous in taste. The probability that a consumer chooses city r is given by the binary logit: ℙ ðλÞ ¼

exp½aV r ðλÞ=v : exp½aV A ðλÞ=v þ exp½aV B ðλÞ=v

(15)

A spatial equilibrium is now such that the flow of in-migrants is equal to the flow of out-migrants: (1  λ)ℙ(λ) = λℙ(1  λ). Taking the logarithm of this expression and plugging (15) yields the equilibrium condition: v λ ¼ 0, ΔV ðλÞ  log a 1λ

(16)

+

where ΔV(λ) is given by (14). The symmetric equilibrium is stable if and only if θ/δ > 2(1 + p)/ϕ  4v/a holds because the slope of the left-hand side of (16) is negative at λ = 1/2. Hence, the more heterogeneous the population of consumers (ν ), the more likely the dispersed configuration. In other words, taste heterogeneity acts as a dispersion force. When commuting costs are such that θ/δ < 2(1 + p)/ϕ  4v/a, the spatial equilibrium involves a larger city and a smaller city. In other words, there is partial agglomeration, namely 1/2 < λ < 1. Furthermore, since dλ /dθ < 0 and dλ /da > 0, lower commuting costs or a higher amenity level leads to a wider range of services in the bigger city. Indeed, the gains associated with agglomeration are magnified with the amenity level because a acts as a multiplication constant of ΔV(λ). (ii) We now consider the case of consumers heterogeneous in incomes w. In a featureless city, there is spatial segregation, namely the distance between consumers’ residential locations is perfectly correlated to their income gap (Fujita 1989). However, the relative location of different income groups is also affected by the spatial pattern of urban amenities. For example, Cuberes et al. (2019) find no evidence of a strong relationship between income and household distance to the center of English cities. We assume that the amenity level ar(x) varies across locations within city r. Furthermore, the unit commuting cost increases with income because the opportunity cost of time rises with income. This cost is now given by θw rather than θ. Consequently, the indirect utility of a w-consumer located at x in city r is such that

V ar ðx,wÞ ¼ ar ðxÞ ð1  θxÞw  Rr ðxÞ=δ þ nr þ q0 : By definition, the bid rent Ψr(x, w) is such that the w-consumers are indifferent across locations in city r. Therefore, Ψr(x, w) must solve the equilibrium condition

1202

C. Gaigné and J.-F. Thisse

dV ar =dx ¼ 0 (Fujita 1989). This solution can be shown to be given by the following expression:  k r ðwÞ  Ψr ðx,wÞ ¼ δ ð1  θxÞw þ nr þ q0 þ , ar ð x Þ where kr(w) is a constant that depends on w. The problem consists in assigning consumers having particular incomes to specific locations within city r under the form of a mapping w(x) that specifies the income of consumers assigned to x. Since land at x is allocated to the highest bidder, the equilibrium income mapping w (x) must solve the utility-maximizing condition @Ψr/@w = 0, while the second-order condition implies @ 2Ψr/@w2 < 0. Totally differentiating @Ψr/@w = 0 with respect to x yields the equilibrium income mapping as a solution to the following differential equation:   2 1 dwr @ Ψr ðx,wÞ @ 2 Ψr ðx,wÞ ¼ : @w2 @w@x w ðxÞ dx w ð x Þ Since @ 2Ψr/@w2 < 0, @Ψr/@w@x and dw(x)/dx have the same sign. Hence the interaction between the amenity and commuting cost functions determines the social stratification of the city through the behavior of the function @ 2Ψr/@ w@x. It also follows from @Ψr/@w = 0 that @kr(w)/@w =  ar(x)(1  θx). This equality implies  @ 2 Ψr 1 @ar ðxÞ θ þ ¼ ð1  θxÞ : ar ðxÞ @x 1  θx @w@x Set Λr(x) = ar(x)(1  θx), a magnitude that has the nature of an hedonic locational index. Differentiating Λr(x) with respect to x shows that dΛr(x)/dx and @ 2Ψr/@w@x have the same sign. The income sorting of consumers within city r is thus driven by the behavior of Λr(x) with respect to x. When the equation @ 2Ψr/@ w@x = 0 have several solutions, there is imperfect sorting in that greater income differences are no longer mapped into more spatial separation. The affluent locate where the hedonic locational index is maximized, whereas the poor reside in locations with the lowest hedonic locational index. Inside the remaining areas, various residential patterns may emerge. For example, the middle class may be split into two spatially separated neighborhoods with the poor in between, while the rich live near the city center. How consumers choose a city is determined by combining the two urban spaces within a location set in which the two CBDs are separated by a given distance. Repeating the above argument shows that consumers select locations, which are both a city location and a residential location in this city.

New Economic Geography and the City

4

1203

The Size and Industrial Structure of Cities

We now take a broader perspective by considering an economy with a manufacturing sector supplying a tradable good and a sector producing a untradable service for local consumption. In this economy, labor is perfectly mobile between cities and sectors. Such a two-sector setting generates a richer set of results than models with a single industry as the objective is to determine the sectorial and spatial structure of the economy.

4.1

Manufactured Goods and Services

The manufactured good is denoted by 1 and the service by 2. The utility derived from consuming qi units of a variety i of good j = 1, 2 is given by (1). The parameters associated with the utility arising from consuming one variety of the manufactured product or of the consumption service are identical. This assumption, which is made for analytical simplicity, does not affect qualitatively the properties of the spatial equilibria. Indeed, since good 1-varieties are available everywhere at the same price, the consumer surplus generated by the consumption of the manufactured good is the same regardless of the city where consumers live. Furthermore, the profits earned by the manufacturing firms are also the same regardless of the city where firms are located. Consequently, the equilibrium values of the consumer surplus and wage associated with good 1 do not play any role in workers’ migration decision. Hence, assuming that the parameters of (1) are the same for goods 1 and 2 entails no significant loss of generality. Preferences involve two goods and are given by 

U q0 ; qij



XZ

Z nj βnj qij di  q2 di ¼ 2ðn1 þ n2 Þ 0 ij j¼1,2 0  Z nj Z nj 1  qij qkj dk di þ q0 , 2ð n1 þ n2 Þ 0 0 nj

(17)

where qij is the quantity of variety i  [0, nj] of good j = 1, 2. Since β > 0, (17) encapsulates both a preference for diversity between goods 1 and 2, as well as a preference for variety across varieties of the same good. Since the second term of (17) is weighted by the ratio nj/(n1 + n2), the intensity of the love for variety varies between services and the manufactured good. In particular, a good or service supplied as a small range of varieties has more impact on the consumer wellbeing than a good or service made available through a large array of varieties. Good 1 is assumed to be freely tradable, so that the total number n1 of good 1-varieties is available in both cities. By contrast, a consumer living in city r = A, B has access to the number n2r of good 2-varieties supplied in city r. In a one-sector

1204

C. Gaigné and J.-F. Thisse

economy, the distribution of firms and workers is a priori undetermined when transport costs are zero. As shown below, the presence of the service sector is sufficient for a specific distribution of the manufactured sector to emerge. Setting τ = 0 in (7) yields the equilibrium price (12) of good 1, which is now constant and the same in both cities. This implies firms pay the same wage w1 to all their workers because they earn the same operating profits regardless of their locations. Let λir be the endogenous number of consumers working in sector i = 1, 2 and living city r = A, B. Labor market-clearing implies n1 ¼

ðλ1A þ λ1B ÞL ϕ1

n2r ¼

λ2r L , with r ¼ A, B: ϕ2

Since labor is mobile between sectors, in equilibrium it must be that w1r = w2r = wr for r = A, B. Setting λr  λ1r + λ2r for the population residing in city r, the budget constraint of a consumer residing in this city is as follows: n1 p1r q1r þ n2r p2r q2r þ

θ λr L þ q0 ¼ q0 þ wr : δ 2

It can readily be shown that the individual demand for a good i-variety in city r is given by   1 p n2r p1  1þ , 1þ 1 þ β β β ðβ þ 1 Þ n1    1 p n1 p2r  2r þ : q2r ¼ 1þ 1þβ β β ð β þ 1Þ n2r 

q1r ¼

(18)

(19)

Although this demand system involves no income effect, it displays a rich pattern of substitution via the relative number of varieties. More specifically, when the number of good i-varieties available in city r increases, the individual demands for good j-varieties are shifted upward because good j becomes relatively more attractive. Furthermore, the size and distribution of the service sector (n2r) affects individual demands for the manufactured good in each city (see (19)). In particular, a growing service sector has a positive impact on the demand for the tradable good. In addition, the demand system (18) and (19) highlights one of the main differences with the one-sector model, that is, the number of varieties ni produced in industry i is endogenous because it depends on the mobility of workers between sectors. Finally, note that the size of the manufacturing sector (n1) affects the individual demand for services in each city and, therefore, the spatial distribution of this sector. In contrast, the distribution of manufactures has no direct impact on individual demands for good 1 because trading this good is costless. This suggests that these firms are indifferent between locations. Nevertheless, they are not because their workers are attracted by cities supplying a wider range of services.

New Economic Geography and the City

1205

Let π 1r ¼ p1 ½q1A ðp1 ÞλA þ q1B ðp1 ÞλB L  ϕ1 wr be the profits earned by a manufacturing firm established in city r. As in Sect. 2, when choosing its own price, each firm treats parametrically the wage wr, as well as the average prices p1A and p1B . Profits being zero in equilibrium, it follows from (18) that the wage paid by a manufacturing firm is equal to w1 ¼ p

X nr λ r L , nϕ r¼A,B 1 1

(20)

where nr = n1 + n2r, while p is defined in (13). The profits made by a service firm set up in city r are given by π 2r ¼ p2r q2r λr L  ϕ2 wr , where p2r is the price quoted by such a firm. Since the equilibrium price of a good 2-variety is given by (12), the equilibrium wage paid by service firms located in city r is given by w2r ¼ p

nr λ r L , n2r ϕ2

(21)

which varies with the size (λrL ) and the sectorial mix (nr/n2r) of city r. Note that the service sector is never agglomerated, for otherwise w would become arbitrarily large (set n2r = 0 in (21)). Since the individual consumer surplus S = 1, the welfare of a consumer working in sector i and living in city r is given by the following expression: V ir ¼ n1 þ n2r þ w 

θ λr L , δ 2

which shows how consumers’ wellbeing depends on the spatial and sectorial distribution of jobs. It follows from this expression that, in equilibrium, the urban cost differential is exactly compensated by the difference in the number of untradable services supplied in each city. In other words, consumers choose to live in a larger city where they bear higher urban costs because they have access to a wider array of local services. A spatial-sectorial equilibrium arises when no worker has an incentive to change place and/or to switch job. How to describe the mobility of workers between regions and sectors? Tabuchi and Thisse (2006) treat the choices of jobs and locations in a symmetric way and borrow the myopic evolutionary dynamics used in NEG to study the mobility of workers across several regions (Fujita et al. 1999):

1206

C. Gaigné and J.-F. Thisse

λ_ir ¼ λir ðV ir  V Þ, where V ¼ Σi Σr λir V ir is the average utility in the global economy. The perfect mobility of labor between sectors implies factor price equalization between cities and sectors, that is, w1 = w2A = w2B. It then follows  from (20) and  (21) that the labor force is equally split between the two sectors λ1 ¼ λ2 ¼ 1=2 . Note that this equality does not mean that workers and sectors are equally distributed between the two cities. The utility differential can be shown to be given by V ir  V ¼ λs L



    2δ  θϕ2 1 θ 1 λ1r  λ2r   , 4 δ 4 δϕ2

where λs is the city s-population with s 6¼ r. Solving this system of equations shows that there are two candidate equilibria (up to a permutation between A and B): λ1A ¼ λ2A ¼ 1=4

(22)

pffiffiffiffi pffiffiffiffi Δ 1 ð2δ  θϕ2 Þ Δ  1 λ1A ¼ þ λ2A ¼ þ , 4 4θϕ1 ϕ2 4 4ϕ1

(23)

and

where Δ  ϕ21 þ 2ϕ1 ϕ2  2ϕ1 ϕ22 θ=δ . As in Sect. 2, the symmetric pattern (22), which involves two cities having the same size and the same industrial mix, is always a spatial-sectorial equilibrium. On the other hand, the asymmetric configuration (23) is a stable equilibrium if and only if Δ > 0, that is,   ϕ1 1 þ : θ 1 is the “regional multiplier”. Therefore, a shock that makes  the   basic sector larger λ1A boosts a more than proportionate growth of the city size λA by attracting a larger number of service firms. However, a larger nonbasic/ service sector also leads to the expansion of the manufacturing sector, and thus the nonbasic sector can also serve as an engine of urban growth. Observe also that, the impact of the service sector on total employment is higher in the larger city because this sector is relatively more concentrated in city A than in city B. Indeed, the equilibrium condition ViA = ViB implies λA ¼

ϕ2 θ  δ 2δ  þ λ , 2ϕ2 θ ϕ2 θ 2A

so that the regional multiplier of the nonbasic sector exceeds of the regional multiplier of the basic sector when θ > δ/ϕ2, an inequality that is more likely to hold when the productivity in the service sector is high. To sum up, manufacturing sectors are not necessarily a more powerful engine of urban growth than service sectors.

4.1.2 Urban Hierarchy As long as θ < 2δ/ϕ2 holds, the larger city supplies a larger array of varieties of each good than the smaller city (λ1A > 1=4 > λ1B and λ2A > 1=4 > λ2B ). In this case, the

1208

C. Gaigné and J.-F. Thisse

urban system displays a Christaller-like hierarchy: By supplying a wider range of services, city A attracts more consumers than city B. Though the demand for the manufactured good is higher in A (see (18)), this city does not attract all manufactured workers because this good is shipped at zero cost. Thus, the process of circular causation comes to an end before reaching the full agglomeration of manufactures. Note, however, that the population gap between the two cities grows when the service sector becomes more productive.

4.1.3 Relative Comparative Advantage When θ > 2δ/ϕ2 holds, the larger city has a larger labor share in the service sector, whereas the smaller city has a larger labor share in the manufacturing sector: λ1B 1 λ1A > >  2 λB λA

λ2A 1 λ2B > >  : 2 λA λB

In other words, the urban system involves relatively specialized cities in that the manufacturing sector is relatively less concentrated than services. This is because the size advantage associated with the larger city no longer compensates the remaining manufacturing workers for the higher urban costs they would bear therein. This result is consistent with the theory of relative comparative advantage: The larger city has a comparative advantage in untradables because it has a larger local market, while the smaller city’ comparative advantage lies in its lower urban costs. Note that cities’ comparative advantage are not exogenous. Rather, they emerge as the outcome of market interaction and labor mobility. This relative specialization also agrees with the empirical evidence: Services have a taste for large cities that manufactures no longer have.

4.2

The Impact of Trade on the Structure of Cities

The setting developed above may be used to shed light on the impact of international trade on the distribution of activities between cities and sectors in the case of a small open economy which imports additional   varieties of the manufactured good from the rest of the world. Denote by nd1 nω1 the endogenous (exogenous) number of domestic (foreign) varieties of this good. Therefore, the total number of good 1-varieties available in the small open economy is n1 ¼ nd1 þ nω1 where nd1 ¼ λ1 L=ϕ1 . The profit earned by a foreign firm exporting to the small economy is given by 

pω1  T



 qω1A λA L þ qω1B λB L ,

where pω1 is the common price of a foreign variety in each city and T > 0 the level of international trade costs, while qf1r is the individual demand for this variety in city r = 1, 2:

New Economic Geography and the City

qω1r

1209

 ¼

1 pω p1  1 þ 1 þ β β β ð β þ 1Þ

  n2r n1 p1 þ nω1 pω1 : with p 1 ¼ 1þ n1 n1

The difference with (18) and (19) is that the average price p1 depends also on the prices of the foreign varieties. The profit-maximizing conditions show that the consumer prices of the manufactured good charged by the foreign and domestic firms are, respectively, given by d pω 1 ¼ p1 þ

T 2

pd 1 ¼

β Φðλ1 Þ, 2β þ 1

where Φðλ1 Þ  1 þ

T nω1 : d 2β n1 þ nω1

(24)

d Observe that the equilibrium prices pω 1 and p1 of the manufactured good capture effects associated with a larger number of imported varieties theω pro-competitive  n1 " , lower trade barriers (T#), and higher labor productivity in the manufacturing  sector (ϕ1#). When nω1 ¼ 0, we fall back on the expression pd 1 ¼ p1 obtained above. We have seen that w1 ¼ w2A ¼ w2B holds in equilibrium. The zero-profit condition then implies that the equilibrium allocation of labor between the two sectors is given by the solution to the equation:

λ1 ðΦÞ ¼

1  nω1 ϕ1 =LΦ , 1=Φ þ 1

(25)

which shows that λ1 increases with Φ. Plugging (25) in (24) yields the equation: Φ¼1þ

  T 1 nω1 þ1 , 2β Φ L=ϕ1 þ nω1

which has a unique solution Φ . Clearly, dΦ /dT > 0, so that dλ1 =dT > 0 since λ1 increases with Φ. In other words, lowering trade barriers triggers the deindustrialization of the small economy. The intuition is easy to grasp. Although foreign competition exerts a downward pressure on domestic prices, hence on wages, in the manufacturing sector, sectorial mobility allows workers to dampen down the wage drop by changing jobs. Lower trade barriers or a larger number of foreign varieties reduce the share of the labor force working in the manufacturing sector by making the service sector relatively more attractive. As in the foregoing, two types of spatial equilibria may emerge. In the first one, the two cities have the same size, hence λ1r ¼ λ1 =2 and λ2r ¼ λ2 =2, when commuting costs are high enough, i.e., θ > δθω with

1210

C. Gaigné and J.-F. Thisse



 1 þ nω1 ϕ1 =L ðϕ1 þ 2ϕ2 Φ Þ θ  : ϕ22 Φ ð1 þ Φ Þ ω

Since dθω/dΦ < 0 and dΦ/dT > 0, the condition θ > δθω becomes more stringent, and thus urban hierarchy is more likely to emerge when trade costs decrease. By contrast, when θ < δθω, the distribution of workers between cities is such that λA



λB

pffiffiffiffiffiffi 2δ Δω ¼ >0 θϕ1 ϕ2

with Δω 

ϕ1 ϕ22

  ðθω  θÞ 1 þ nω1 ϕ1 =L Φ : δ ð1 þ Φ  Þ

Cumbersome, but standard, calculations show that, once commuting costs take on sufficiently low values, λA  λB increases when T decreases. In this case, more intense competition from the rest of the world implies an employment shift to the larger city where workers consume a wider range of local services. To sum up, in a small open economy where commuting costs are not too high, trade liberalization fosters deindustrialization and exacerbates the unequal distribution of jobs between cities.

5

The Future of Cities

Can NEG models such as those surveyed in the foregoing sections help understand some of the main challenges faced by cities. In what follows, we consider four issues that have important policy implications: (i) The higher number of crimes in larger cities, (ii) the environmental impact of the rapid urbanization in emerging countries like China and India, (iii) the growing share of retirees in Japan and European countries, that is, individuals whose income does not stem from labor, and (iv) the impact of new information and communication technologies, which are often described in the media as a substitute for physical proximity.

5.1

Urban Crime

There is relatively more crime in big cities than in small cities. For example, the rate of violent crime in American cities with more than 250,000 inhabitants is 346 per 100,000 inhabitants, whereas in cities with less than 10,000 inhabitants, the rate of violent crime is 176 per 100,000. Gaigné and Zenou (2015) provide a microfoundation to these empirical facts by extending the model developed in Sect. 3.1. An individual chooses to be either a worker or a criminal. Hence λrL = Cr + Wr where Cr (resp., Wr) is the mass of criminals (resp., workers) in city r. Individuals are assumed to be heterogeneous in their incentives to commit crimes. More specifically, individuals have different aversions to crime, that is, a higher

New Economic Geography and the City

1211

c means more aversion towards crime. Individuals are rational decision-makers who engage in legal or illegal activities according to the expected utility generated by each activity. Denoting by ξ the lump-sum amount stolen by a criminal, a worker’s income is now given wr  ξCr, while the income of a criminal is ξWr. On average, each worker is “visited” by Cr criminals who each takes ξ; each criminal “robs” Wr workers and takes ξ from each. Therefore, the average proceeds from crime are equal to ξLr. Although there is no direct competition between criminals, they are indirectly connected via the total mass of workers living in a city because more criminals induce less workers, which, in turn, lowers pecuniary returns per crime. Criminals travel less than workers. For simplicity, we normalize criminals’ commuting costs to zero. In equilibrium, workers bid away criminals who will live at the city fringe. Urban costs borne by a worker are now given by θWr/(2δ) instead of θλrL/(2δ).

5.1.1 Criminal Activity and City Size The indirect utility of a worker is given by V r ¼ nr þ wr  ξC r  θW r =2δ þ q0 , whereas the indirect utility of a criminal is V cr ¼ nr þ ξW r  c þ q0. The individual indifferent between committing a crime and working is such that V r ¼ V cr. Assuming that c is uniformly distributed on the interval [0, 1], the fraction of criminals in a city is given by ψ r  Cr/λr. The equilibrium share of criminals is thus given by the expression: ψ r ¼

θ=δ  2ðp2 =ϕ  ξÞ , θ=δ þ 2=λr L

which is always smaller than 1 when p2/ϕ > ξ, a condition that must hold for a city to exist. Hence the total mass of criminals in the economy depends on the inter-city distribution of individuals λ: C ðλÞ ¼ λLψ A þ ð1  λÞLψ B : It can readily be shown that @C(λ)/@λ > 0 for λ  1/2. Even though wages increases with city size, the agglomeration of activities gives rise a multiplier effect through the following two channels. First, a larger city size induces greater expected pecuniary returns because criminals face a larger number of potential victims. Second, a more populated city diminishes the opportunity cost of illegal activity. Even if the amount of stolen wealth by each criminal, that is, ξ(1  ψ r)λrL, is proportional to city size, there is proportionally more crime in bigger cities than in smaller cities. Furthermore, @ψ r =@θ > 0 when ψ r < 1. Hence, a worse job access leads to more criminal activities within a city. When commuting costs increase, total urban costs also increase, so that the net wages of workers are reduced. This, in turn, leads to a larger fraction of criminals.

1212

C. Gaigné and J.-F. Thisse

5.1.2 The Emergence of Large Cities with High Criminal Activity Individuals choose in which city to reside without knowing their types, which are revealed after their location choices. Therefore, the location of individuals is driven by the inter-city difference in their expected utility: Z EVr ¼

c^r 0

Z V cr dc

þ

1 c^r

V r dc ¼

  1   2 ψ þ V r ψ r : 2 r

In addition to the various forces presented in Sect. 3.1, the location of individuals is affected by the level of criminal activity. The different mechanisms interact with the decision to become a criminal, hence with the level of wages, urban costs, and λ. The expected utility differential is given by EVA  EVB ¼

  1 λ  Γ, 2

where Γ is a positive or negative bundle of parameters. Gaigné and Zenou (2015) show that, when commuting costs take on intermediate values, the larger city hosts more workers and more criminals. Furthermore, since ψ A > ψ B and EVA = EVB when 1/2 < λ < 1, workers living in the smaller city are better off than those living in the larger city.

5.2

Are Compact Cities Ecologically Desirable?

The transport sector is a large and growing emitter of greenhouse gases (GHG). It accounts for 30% of total GHG emissions in the United States and approximately 20% of GHG emissions in the EU-15. Moreover, road-based transport accounts for a very large share of GHG emissions generated by this sector. For example, in the US, nearly 60% of GHG emissions stem from gasoline consumption for private vehicle use, while a share of 20% is attributed to freight trucks, with an increase of 75% from 1990 to 2006. Although new technological solutions will improve energy efficiency, other initiatives are needed, such as mitigation policies based on the reduction of average distances travelled by commodities and people.

5.2.1

The Ecological Trade-Off Between Commuting and Transport Costs We have seen that transporting people and commodities involves economic costs. It also implied ecological costs that obey the fundamental tradeoff of Sect. 2.1: The agglomeration of firms and people in a few large cities minimizes the emissions of GHG stemming from shipping commodities, but increases those generated by longer commuting; dispersing people and firms across numerous small cities has the opposite costs and benefits. If cities are more compact (δ"), then, keeping population and firms fixed, the costs associated with the former spatial configuration (concentration) fall relative to those associated with the

New Economic Geography and the City

1213

latter (dispersion) because people commute over shorter distances. However, when one recognizes that firms and people choose their location in order to maximize profits and utility, a policy that aims to make cities more compact will affect the inter-city pattern of activities by fostering their progressive agglomeration, thus raising the level of GHG within fewer and larger cities. Therefore, the ecological effects of an increasing-density policy are a priori ambiguous (Gaigné et al. 2012). To illustrate how this trade-off operates, we consider the model of Sect. 2.1 and assume that the carbon footprint E of the urban system stems from the total distance travelled by commuters within cities (D) and the total quantity of the manufactured good shipped between cities (T ): E ¼ eD D þ eT T , where eD is the amount of GHG generated by one unit of distance travelled by a consumer, while shipping one unit of the manufactured good between cities generates eT units of carbon dioxides. Because consumers are symmetrically distributed on each side of the CBD at the equilibrium, the value of C depends on the inter-city distribution of the manufacturing sector and is given by D¼

 L2  2 λr þ λ2s : 4δ

Clearly, the emission of GHG stemming from commuting is minimized when the manufacturing sector is evenly dispersed between two cities (λA = λB = 1/2). Regarding the value of T, it is given by the sum of equilibrium trade flows: T¼

ϕ½4β  ð4β þ 1Þτ λA λB , 2ð2β þ 1Þβ

where T > 0 because τ < τtrade. As expected, T is minimized when consumers and firms are agglomerated within a single city. Note also that T increases when shipping goods becomes cheaper because there is more inter-city trade. Hence, transportation policies that reduce shipping costs foster a larger emission of GHG. Thus, E is described by a concave or convex parabola in λ, so that the emission of GHG is minimized either at λ = 1 or at λ = 1/2. Therefore, it is sufficient to evaluate the sign of E(1; δ)  E(1/2; δ), which is negative if and only if δ > δe where δe 

eD ð2β þ 1Þβ , eT ϕ½4β  ð4β þ 1Þτ

which is positive because τ < τtrade. Thus, the agglomeration of activities within a single compact city is ecologically desirable if and only if δ > δe. Otherwise, dispersion is the best ecological outcome. Consequently, agglomeration or dispersion is not by itself the more preferable pattern from the ecological viewpoint. For

1214

C. Gaigné and J.-F. Thisse

agglomeration to be ecologically desirable, the population density must be sufficiently high for the average commuting distance to be small enough.

5.2.2 Does the Market Yield a Good, or a Bad, Ecological Outcome? As seen in Sect. 2.1, λ = 1/2 is a stable equilibrium if the population density δ is smaller than δm  ϕθ/Υ(τ). Otherwise, the manufacturing sector is concentrated into a single city. Because δm increases with θ and is equal to 0 at θ = 0, while δe > 0 is independent of θ, there exists a unique value θ such that δm = δe. As a result, the market delivers a configuration that minimizes or maximizes the emissions of GHG. In other words, the market yields either the best or the worst ecological outcome. Consider, first, the case where θ exceeds θ. If δ < δm, the market outcome involves two cities. Keeping this configuration unchanged, a more compact city (δ") always reduces the emissions of pollutants. Once δ exceeds δm, the economy gets agglomerated, thus leading to a downward jump in the GHG emissions. Further increases in δ allow for lower emissions of GHG. Hence, when commuting costs are high, a denser city always yields lower emissions of GHG. Assume now that θ < θ. As in the foregoing, provided that δ < δm, the market outcome involves dispersion while the pollution level decreases when the city gets more compact. When δ crosses δm from below, the pollution now displays an upward jump. In other words, when commuting costs are low, more compact cities are not always ecologically desirable. When consumers and firms are mobile, what matters for the total emission of GHG is the mix between city compactness (δ) and city size (λ), thus pointing to the need of coordinating environmental policies at the local and global levels. In other words, environmental policies must focus on the urban system as a whole and not on individual cities. Furthermore, we have seen in Sect. 2.4 that the internal structure of cities may change with population density. In this case, the ecological effects of an increasing-density policy are even more ambiguous: More compactness favors the centralization of jobs at the city center. Unless commuting to SBDs generates a massive use of private cars, compact and monocentric cities may generate more pollution than polycentric and dispersed cities. By lowering urban costs without reducing the benefits generated by large urban agglomerations, the creation of SBCs may allow large cities to reduce GHG emissions while benefiting from stronger agglomeration economies. In sum, building tall cities is part of the answer to the question that serves as a title to this section. However, we contend that policy-makers should also pay attention to the structure and number of cities.

5.3

Cities in Aging Nations

The old-age dependency ratio (the ratio people aged 65 and older to people aged 15 to 64) is projected to double by 2050 within the European Union, with four persons of working age for every elderly citizen to only two. This ratio is expected to be lower in the United States, with a rise from 19 to 32%, but higher in Japan, with a rise from 25 in 2000 to 72% in 2050. Such demographic changes are likely to have a

New Economic Geography and the City

1215

major impact on cities because the retirees are driven by location factors that differ from those governing workers’ residential choices. Workers’ welfare depends on local services, land rent and wages, whereas rentiers’ welfare depends only upon local services/amenities and land rent. As a consequence, when the share of old people takes on a sufficiently high value, the process of circular causation sparked by workers’ location choice could well be challenged. To study how the urban system might change as the old-age dependency ratio rises, we consider the model of Sect. 4.1 in which the population is split between two groups, i.e., the elderly and the workers whose respective shares are ρ  0 and 1 – ρ  0. City A is endowed with an amenity a > 0 that is valued only by the elderly, while B is called the working-city. We close the model by assuming that land is collectively owned by the elderly. The income of a retiree is, therefore, given by the aggregate land rent (ALR) divided by the total number of elderly (ρL). Workers and retirees have different unit commuting costs, t and θ, respectively. Let sr be the share of elderly people living in city r = A, B. City r-population is then given by λr = (λ1r + λ2r)(1 – ρ)L + srρL. Besides λ1r and λ2r, we also have to determine sr. Workers and retirees have different unit commuting costs denoted, respectively, by θ and t. Since moving is typically more demanding for the retirees, it is reasonable to assume t > θ. However, the results remain qualitatively the same when t < θ. The inequality t > θ implies that the elderly live closer to the CBD. Their urban costs UC or are given by UC or ¼

t sr ρL θ ðλ1r þ λ2r Þð1  ρÞL þ : δ 2 δ 2

whereas workers’ urban costs are now as follows: UC r ¼

θ sr ρL θ ðλ1r þ λ2r Þð1  ρÞL þ , δ 2 δ 2

which is equal to (5) when ρ = 0. Both workers and retirees choose simultaneously their location according to their respective inter-city utility differential. The asymmetry in the amenity supply implies the following elderly’s equilibrium condition V oA  V oB ¼ a with V or ¼ n1 þ n2r þALR=ρ  UC or . It can be shown that the equilibrium distribution of the elderly between cities is the same when ρL is sufficiently large: sA ¼

1 aδ þ : 2 ðt  θÞρL

(26)

Otherwise, sA ¼ 1. In both cases, more than half of the retirees choose to live in the amenity-city. When the share of the elderly people is not too high, the economy displays two stable equilibria. In the former one, the working-city remains dominant; in the latter one, the amenity-city is the primate city. This could explain why there are

1216

C. Gaigné and J.-F. Thisse

contradicting opinions regarding the evolution of urban systems in aging nations. The pattern in which the working-city remains the primate city, while the other city accommodates the larger share of retirees, is the equilibrium that agrees with current empirical evidence. The corresponding distribution of workers between sectors and cities is as follows: λ1B

pffiffiffiffiffiffi   1 ð2δ  θϕ2 Þ Δa ρ 1  s  ¼ þ þ 4 1ρ A 2 4θϕ1 ϕ2

λ2B

pffiffiffiffiffiffi Δa 1 ¼ þ 4 4ϕ1

(27)

where Δa  ϕ1 ðϕ1 þ 2ϕ2 Þ 

2θϕ1 ϕ22 < Δ: δð1  ρÞ

The expression (27) shows how the elderly’ location choices affect the distribution of activities. It follows from (26) and (27) that workers and retirees are not attracted by the same city. This provides a rationale for recent empirical evidence, which suggests that retirees and workers tend to live separately as the old-age dependency ratio increases. Furthermore, when city A’s local government improves its amenity supply (a"), this city attracts a growing number of retirees, which allows the number of jobs in the working-city to rise. Moreover, an aging population (ρ") induces the dispersion of services at the expense of the working-city. In other words, an increasing share of retirees may challenge the efficiency of the working-city. As a result, if the agglomeration of manufactures and services generates benefits not taken into account in the model, the economy will incur efficiency losses. However, beyond some threshold the migration of retirees toward the amenity-city raises the level of urban costs and/or decrease the supply of local services in city A. This restores partially the attractiveness of the working-city. Nevertheless, this need not be true for services which still benefit from a big market in the elderly-city. Finally, of old-age dependency ratio, the  regardless  working-city remains the larger one λB > λA . To sum up, though in an aging nation the relocation of consumption services weakens the supremacy of working-cities, these ones remain the biggest ones. Indeed, as long as it is more profitable for the bulk of manufactures to congregate, a large share of services is prompted to set up therein. In addition, as the population gets older, cities diverge in their job and demographic structures. Yet, the supply of consumption services in both cities should prevent the complete spatial separation of workers and retirees.

5.4

Do New Information and Communication Technologies Foster the Death of Cities?

Besides transport costs, geographical separation generates another type of spatial friction, that is, communication costs. Decreasing communication costs allow the

New Economic Geography and the City

1217

slicing up of the supply chain by reducing the costs of coordinating production across space and time zones. As a result, firms are able to disperse their functions, such as management, finance, R&D, and production, into geographically separated units in order to benefit from the attributes specific to different locations. This topic has been largely overlooked in NEG. By contrast, the topic has attracted a lot of attention in the theory of the multinational enterprise. To illustrate the interplay between communication and transport costs, consider a monopoly formed by a headquarters (HQ) and one or two production plants. The two regional markets can be supplied according to one of the following two organizational forms. First, the firm is integrated when its operates a single plant located in the same city as the HQ and ships its output to the other city. This organizational configuration corresponds to the case studied in Sect. 2. Second, the firm chooses a horizontal structure where a plant is set up in each city. In other words, rather than exporting, the firm serves the remote city by establishing a subsidiary in the corresponding market. This implies an additional investment ϕ and a communication cost ζ between the HQ and the plant established in the distant city. In this new setting, agglomeration occurs when the monopolist’s activities are spatially integrated while dispersion prevails when the firm operates two spatially separated plants. The profit of an integrated firm is given by (6), whereas the profit of a horizontal firm is equal to π H ¼ pA qA λA L þ ðpB  ζ ÞqB λB L  ϕwA  ϕwB : Consumers’ utility is given by (1) where γ = 0 because the firm is a monopoly. Hence, the inverse demand function in city r becomes qr = 1/β – pr/β. The monopoly first chooses its organizational form and, then, the consumer price in each city. The profit-maximizing prices of an integrated firm are given respectively by pA ¼ 1=2 H and pB ¼ ð1 þ τÞ=2. The firm charges prices equal to pH A ¼ 1=2 and pB ¼ ð1 þ ζ Þ=2 when it chooses to go horizontal. As expected, higher communication costs increase the price paid by consumers living in city B. The firm chooses to go horizontal if and only if π H > π or, equivalently, ϕwB
0 paid to the firm. In this case, the firm weighs the cost of offshoring the production of its product against the cost saving from producing at a distant place where production costs are lower. The profit of a vertical firm is as follows: π V ¼ ðpA  τ  ζ þ mB ÞqA λA L þ ðpB  ζ þ mB ÞqB λB L  ϕwB : and The profit-maximizing prices are pVA ¼ ð1 þ τ þ ζ  mB Þ=2 ¼ ð1 þ ζ  mB Þ=2 . The firm chooses to be vertical if and only if π V > π or, equivalently,

pVB

ϕwB
0, whereas mB  (λA  λB)Lτ  ζ can be either positive or negative. Clearly, the firm will choose to be vertical when transport and communication costs are low relative to the comparative advantage. This example captures the main trade-offs described in Fujita and Thisse (2006) who revisited the core-periphery model when firms are free to choose to be integrated or vertical. The core region hosts the headquarters, while wages are lower in the periphery. When communication costs are high, reducing transport costs leads firms to be integrated. However, things are different when communication costs are low. For high transport costs, most plants are set up in the core. However, once transport costs fall below some threshold, decreasing communication costs trigger the relocation of activities and the core loses a growing share of plants. This is because the vertical firms are able to benefit from lower production costs in the periphery without losing much business from the core. Eventually, when communication costs have reached a sufficiently low level, the economy ends up with a deindustrialized core, which retains firms’ strategic functions. In other words, decreasing transport costs incite firms to be integrated, whereas decreasing communication costs incite firms to get fragmented. This prediction concurs with Baldwin (2016) who argues that big drops in transport costs and in communication costs are at the origin of two different types of globalization. Transport costs imply that selling at a distance is more expensive, but these costs do not affect sales on the domestic market. By contrast, producing at a distance imposes communication costs that affect the whole volume of production while shipping the output from the plant to the firm’s home market is still required. This is why the spatial fragmentation of the firm needs low communication and transport costs, which is precisely what Fujita and Thisse (2006) have shown.

New Economic Geography and the City

6

1219

Conclusion

The idea of spatial interaction is central to regional science. Broadly defined, spatial interaction refers to flows across space that are subject to various types of spatial frictions, such as traded goods, migrations, capital movements, interregional grants, remittances, and the interregional transmission of knowledge and business cycle effects. Though the NEG literature has for the most part focused on the mobility of goods and production factors, these issues are at the heart of NEG. Instead of writing one more review of the vast literature produced in the footsteps of Krugman (1991), we have chosen to highlight the role that NEG may play in understanding the process of urban development. More specifically, through several major trade-offs we have covered a range of issues that highlight the working of urban systems. To do so, we have used very simple models, which vastly contrast with the heavy mathematical apparatus employed in the literature. Cities of the twenty-first century face new and important challenges, such as climate change, aging population, crime, poverty, social exclusion, food security, the supply and management of transportation and communication infrastructure, and competition with a few global cities. It is, therefore, fundamental to have sound theoretical models that can be used as guidelines in developing empirical research and designing new policies. Is NEG a useful tool? We believe that the answer is yes. From the methodological standpoint, NEG has several major merits. First, the decisions made by firms and households are based on land rents, wages and prices, which are themselves endogenous and related to the size and structure of cities. Second, NEG takes into account the fact that households and firms may relocate between and within cities in response to major changes in their economic environment. Last, the various fields of modern economics NEG is connected with and provide a set of tools and concepts that permit to tackle new and challenging issues. Nevertheless, NEG suffers from a major drawback: It is built on a two-location setting. Yet, it is well known that a firm’s location is the balance of a system of forces pulling the firm in various directions. The new fundamental ingredient that a multilocation setting brings about is that spatial frictions between any two cities are likely to be different. As a consequence, the relative position of a city within the whole network of interactions matters. Another key insight one can derive in a multilocation economy is that any change in the underlying parameters has in general complex impacts that vary in non-trivial ways with the properties of the underlying transport network. When there are only two locations, any change in structural parameters necessarily affects directly either one of the two cities, or both. On the contrary, when there are more than two locations, any change in parameters that directly involves only two cities now generates spatial spillover effects that are unlikely to leave the remaining cities unaffected. Spatial quantitative models, which blend ingredients from NEG and urban economics, could well provide a solution to this quandary. More work is called for, but one should not expect a silver bullet to solve the dimensionality problem.

1220

C. Gaigné and J.-F. Thisse

We conclude with the following remark. So far, the Internet and rapid, safe trips have not proven to be a good substitute for physical proximity in activities where knowledge spillovers, the transfer of information, and trust are critical. At least for now, face-to-face contacts remain the best way to deal with the communication curse. That said, it is hard to predict what place-based constraints will become with the development of new digital technologies. Nevertheless, it is worth noting with Goldfarb and Tucker (2019) that the Internet might well be a complement to cities because of the spatial correlation that characterizes many social networks.

7

Cross-References

▶ Agglomeration and Jobs ▶ Classical Contributions: Von Thünen and Weber ▶ Cities, Knowledge, and Innovation ▶ Commuting, Housing, and Labor Markets ▶ New Geography Location Analytics ▶ Spatial Equilibrium in Labor Markets ▶ Supply Chains and Transportation Networks Acknowledgments We thank P. Asadi, J. Cavailhes, M. Fujita, T. Gokan, S. Kichko, F. Moizeau, S. Riou, and T. Tabuchi for comments and discussions. This work is supported by the Russian Science Foundation (Project No. 18-18-00253).

References Albouy D, Ehrlich, G (2018) Housing productivity and the social cost of land-use restrictions. J Urban Econ 107:101–120 Albouy D, Ehrlich G, Shin M (2018) Metropolitan land values. Rev Econ Stat 100:454–466 Baldwin RE (2016) The great convergence. Harvard University Press, Cambridge, MA Behrens K, Robert-Nicoud F (2015) Agglomeration theory with heterogenous agents. In: Duranton G, Henderson JV, Strange WC (eds) Handbook of regional and urban economics, vol 5. Elsevier, New York, pp 171–245 Carlino G, Kerr WR (2015) Agglomeration and innovation. In: Duranton G, Henderson JV, Strange WC (eds) Handbook of regional and urban economics, vol 5. Elsevier, New York, pp 349–404 Combes P-P, Gobillon L (2015) The empirics of agglomeration economies. In: Duranton G, Henderson JV, Strange WC (eds) Handbook of regional and urban economics, vol 5. Elsevier, New York, pp 247–348 Cuberes D, Roberts, J, Sechel, C (2019) Household location in English cities. Reg Sci Urban Econ 75:120–135 Duranton G, Puga D (2004) Micro-foundations of urban increasing returns: theory. In: Henderson JV, Thisse J-F (eds) Handbook of regional and urban economics, vol 4. Elsevier, Amsterdam, pp 2063–2117 Fujita M (1989) Urban economic theory. Land use and city size. Cambridge University Press, Cambridge Fujita M, Thisse J-F (2006) Globalization and the evolution of the supply chain: who gains and who loses? Int Econ Rev 47:811–836

New Economic Geography and the City

1221

Fujita M, Thisse J-F (2013) Economics of agglomeration. Cities, industrial location and globalization, 2nd edn. Cambridge University Press, Cambridge Fujita M, Krugman P, Venables AJ (1999) The spatial economy. Cities, regions and international trade. the mit press, Cambridge, MA Gaigné, C, Riou, S, Thisse, J-F (2012) Are compact cities environmentally friendly? J Urban Econ 72:123–136 Gaigné C, Zenou Y (2015) Agglomeration, city size and crime. Eur Econ Rev 80:62–82 Glaeser EL, Kolko J, Saiz A (2001) Consumer city. J Econ Geogr 1:27–50 Goldfarb A, Tucker C (2019) Digital economics. J Econ Lit 57:3–43 Handbury J, Weinstein D (2015) Goods prices and availability in cities. Rev Econ Stud 82:258–296 Helpman E (1998) The size of regions. In: Pines D, Sadka E, Zilcha I (eds) Topics in public economics. Theoretical and applied analysis. Cambridge University Press, Cambridge, pp 33–54 Krugman P (1991) Increasing returns and economic geography. J Polit Econ 99:483–499 Tabuchi T, Thisse J-F (2006) Regional specialization, urban hierarchy, and commuting costs. Int Econ Rev 47:1295–1317 Tabuchi T, Thisse J-F, Zhu X (2018) Does technological progress magnify regional disparities? Int Econ Rev 59:647–663

New Economic Geography After 30 Years Steven Brakman, Harry Garretsen, and Charles van Marrewijk

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Increasing Returns and Intra-industry Trade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Basic Ingredients of the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The Home Market Effect as a Volume Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 The Home Market Effect as a Factor Price Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Adding Labor Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Core New Economic Geography Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Model Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Empirical Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Policy Consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1224 1226 1226 1228 1232 1233 1233 1236 1237 1241 1242 1243 1243

Abstract

In this chapter we briefly discuss how the New Economic Geography literature follows and builds on international trade theory. We then turn to the main empirical implications of New Economic Geography. We highlight that the main problem with empirical applications of New Economic Geography is that a single test of the implications of the model is illusive because of the structure of the model. As a result, the main consequences of the model are usually tested separately. Some of the implications of the model are also consistent with other

S. Brakman (*) · H. Garretsen Faculty of Economics and Business, University of Groningen, Groningen, The Netherlands e-mail: [email protected]; [email protected] C. van Marrewijk Utrecht University School of Economics, Utrecht, The Netherlands e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_39

1223

1224

S. Brakman et al.

models, notably models in urban economics. We, therefore, stress that despite an initial surge in empirical research inspired by New Economic Geography as well as improved methods, the empirical evidence still remains rather sketchy. Moreover, up to now policy advise based on New Economic Geography is mostly qualitative. Keywords

Transport cost · Large market · Nominal wage · Home market effect · Shock sensitivity · New economic geography

1

Introduction

The Nobel prize committee that awarded the Nobel prize in economics to Paul Krugman in 2008, stressed that the award was essentially given to him for his contributions in (mainly) three papers in two disciplines: international trade and economic geography. The prize committee of the Royal Swedish Academy of Sciences stated in its scientific background report (p. 1): “Traditionally, trade theory and economic geography evolved as separate subfields of economics. More recently, however, they have converged [to] become more and more united through new theoretical insights, which emphasize that the same basic forces simultaneously determine specialization across countries for a given international distribution of factors of production (trade theory) and the long-run location of those factors across countries (economic geography): http://nobelprize.org/nobel_prizes/economics/lau reates/2008/index.html.” Krugman (1979, 1980) deal with international trade (notably intra-industry trade), whereas Krugman (1991) extends the analysis of the first two papers by endogenizing the spatial allocation of economic activity. Both contributions became workhorse models for the two disciplines: the monopolistic competition model became the standard international trade reference for models incorporating intraindustry trade and the extension of this model, by allowing for factor mobility, became the core model of New Economic Geography (also known as Geographical Economics, a term we prefer). Krugman (1991) extended the concept of the spatial equilibrium. According to Glaeser (2008, p. 4), the spatial equilibrium is the “single most important concept in urban or regional economics. . .if identical people are choosing to live in two different places then those two different places must be offering an equivalent bundle of advantages, like wages, prices and amenities. Essentially, there must be no potential for arbitrage over space.” In urban economics routinely two types of spatial equilibria are distinguished. First, a spatial equilibrium within cities that describes a trade-off between housing costs and transportation costs (the origins of this model goes back to the Von Thunen model). Closer to Central Business districts housing costs are

New Economic Geography After 30 Years

1225

high but commuting costs low. In equilibrium these costs are balanced and no one has an incentive to relocate anymore within cities. Second, an equilibrium between cities (Henderson 1974). These models describe a trade-off between increasing returns that rise with city size (higher nominal wages), but increasing city size also increase all sorts of negative externalities, such as congestion, high housing costs, pollution etc. In equilibrium different cities offer identical net wages, that is nominal wages corrected for negative externalities, and in equilibrium no one has an incentive to relocate between cities. Despite the sophistication of the latter type of models, explaining sector specialization of different cities – they have one big disadvantage; cities are modelled as floating islands. It does not matter were on the map the cities are located. In this sense geography is usually not modelled in models based on Henderson (1974). In Krugman (1991) this omission is tackled head on, and is central in his equilibrium. Cities are interrelated and geography is crucial; it is of fundamental importance where on the map cities are located, and location is associated with transportation costs to the other cities. The price that has to paid for this extension is that different specialization patterns regarding different types of industries are usually not modelled. In this chapter, and almost 30 years after Krugman (1991), we will highlight the main characteristics of New Economic Geography and in doing so we will not only explain the fundamentals of New Economic Geography and trace its origins to international trade theory, but we also mention some of the more recent developments. Next, we will illustrate the current state of affairs with respect to the empirical evidence for New Economic Geography and, related to this, the policy consequences of the model. Three features stand out. First, the combination of increasing returns to scale, imperfect competition and transport costs gives rise to the so-called Home Market effect. Second, the combination of the Home-market effect with interregional labor mobility endogenizes the location decisions of firms and footloose workers and hence the spatial allocation of both supply and demand. This set-up allows for multiple equilibria, one of which is a core-periphery equilibrium. This explains why the model has also been used in urban economics to explain, for example, a system of cities. Third, despite a large and increasing literature on empirical evidence, a convincing empirical test of New Economic Geography is still missing. This also implies that policy advice based on the model should be handled with care and so far the basic policy contributions of New Economic Geography are of a qualitative nature. In this chapter we will focus on the three aforementioned issues. A chapter like this is too short to provide a full survey, but the key issues will be introduced and explained (see reference list). In essence, by discussing the three topics, we will stress the most important contributions of New Economic Geography, and explain the tug of war between the agglomeration and spreading forces that are active in the New Economic Geography models and their potential empirical and policy implications.

1226

S. Brakman et al.

2

Increasing Returns and Intra-industry Trade

2.1

Basic Ingredients of the Model

During the 1970s it became increasingly clear that the standard workhorse models of international trade were at odds with the facts. The Heckscher-Ohlin and the Ricardian model give a rationale for inter-industry trade only. Empirical research (Grubel and Lloyd 1975), however, clearly showed that trade between (developed) countries was mainly in the form of intra-industry trade. The bulk of trade is in similar goods between similar countries, a puzzling phenomenon in neoclassical trade models. The theoretical challenge was to come up with a trade model that allowed for intra-industry trade. A possible explanation should center on the role of increasing returns to scale and on an imperfect competition in market structure. In Krugman (1979) a simplified version of the monopolistic competition model, as developed by Dixit and Stiglitz (1977), is introduced (see also Dixit and Norman (1980)). The Dixit-Stiglitz model provides a fruitful way to model monopolistic competition. Almost instantly it became the preferred choice of researchers to model monopolistic competition, and it has become the benchmark model in various fields (see for a survey of contributions, Brakman and Heijdra (2004)). We give a simplified version of the model below (The discussion is based on Brakman and Garretsen (2009)).

2.1.1 Demand Household utility is characterized by a love-of-variety effect that assumes that each variety ci, i = 1, . . ., N enters utility U symmetrically as an incomplete substitute, H is a homogeneous commodity which can serve as a numéraire, and M is often referred to as manufacturing: U ¼H

1δ

δ

M ,where M ¼

N X

!1=ρ cρi

 and 0 < ρ 

i¼1

1

1 e

 < 1:

(1)

where δ is the Cobb-Douglas share of M and (1  δ) is the Cobb-Douglas share of H. The elasticity of substitution between varieties e is determined by the parameter ρ. If the number of varieties is (very) large, firms consider the elasticity of demand e as given. Utility maximization of Eq. (1) subject to the budget constraint gives (For a step by step derivation see Brakman et al. (2019), Chap. 7): ci ¼

pe i δwL N P p1e j

(2)

j¼1

where pi is the price of variety i and N is the number of varieties. The term in the denominator is related to the price index for the manufactured good. In what follows we assume that there is only one factor of production, labor L with wage rate w.

New Economic Geography After 30 Years

1227

2.1.2 Supply For the explanation of intra-industry trade, it is necessary that similar goods are produced in different places. Intra-industry trade follows immediately in a multi country setting if all varieties are consumed by all consumers (love of variety effect). A simple way to introduce (internal) economies of scale and ensuring that each variety i is produced by a single firm is via the following labor cost function: l i ¼ α þ βxi , where α, β > 0:

(3)

Labor li is the only production factor, which earns a wage w. The parameters α and β determine the fixed and marginal costs, αw and βw respectively (the fixed costs give rise to the internal scale economies). Equation (3) implies that average costs are decreasing in the quantity of variety i that is produced, and this warrants that in the competitive equilibrium a particular variety is produced by the firm that had initially the largest market share and thus the lowest costs per unit of production. The full-employment condition requires that the summation of Eq. (3) over all varieties equals total labor supply: L¼

N X

li ¼

N X

i¼1

ðα þ βxi Þ:

(4)

i¼1

Firms are defined symmetrically which implies that pi = p and xi = x for all i in equilibrium. So, the number of varieties, N, in equilibrium can easily be calculated from: L ¼ N ðα þ βxÞ:

(5)

2.1.3 Equilibrium The next step is to derive the market equilibrium. This gives the equilibrium output of each firm xi the equilibrium number of varieties N (and hence the equilibrium number of firms), and it also yields the equilibrium price–wage ratio pi/w. Profit maximization gives the familiar mark-up pricing rule, equating marginal costs to marginal revenue (dropping the index because of symmetry): p¼

e p e βw or ¼ β: e1 w e1

(6)

The zero profit condition implies that: 0 ¼ px  ðα þ βxÞw or

p α α ¼βþ ¼βþ : w x Lci

(7)

Equations (6) and (7) together give the break–even output x of a firm that is consistent with profit maximization, and free entry and exit into the market: x = (e  1)α/β.

1228

S. Brakman et al.

2.1.4 International Trade The gains of international trade are present in the model outlined above, but only in a rudimentary way. An increase in the available labor supply still shifts the average cost downward. This shift has implications for the number of varieties that are produced, which increases (see Equation 5: L/li = N ), but has no impact on other elements in the model. Consumers gain from trade because they consume more varieties than before international trade was allowed. More interesting results can be derived by introducing transport costs. This is certainly true from a New Economic Geography perspective because the relevance of economic geography crucially hinges upon the presence of positive transport costs; without transport costs geography does not matter. The combination of increasing returns to scale and transport costs implies that firms not only want to concentrate production in a single location (because of increasing returns to scale) but they also care where in space they locate production (because of the transport costs). Firms prefer to locate where demand for the variety they produce is relatively large. This interplay between increasing returns to scale, transport costs and demand has become known as the Home Market effect, which is also the basis for New Economic Geography literature. Our discussion of the Home Market effect is in two parts: the more than proportional production of the increasing returns sector in the larger market (the volume effect, Sect. 2.2), and the higher wages of the increasing returns sector in the larger market (the price effect, Sect. 2.3). The key issue is that with positive transport costs, the larger market offers location benefits that are absent in models that do not include transport costs. We introduce this difference as the two versions have important consequences for empirical tests of the model that are not always taken into consideration.

2.2

The Home Market Effect as a Volume Effect

Iceberg transportation costs have the advantage that transportation costs can be introduced without having to deal with a transportation sector (see for a discussion Fingleton and McCann (2007)). Assume the iceberg costs are τ  1; that is τ units have to be shipped in order for one unit to arrive in the other country. This raises the costs of imported varieties to pτ. Demand for a domestic variety now comes from two sources: domestic demand (8a) and foreign demand (8b). From Eq. (2) it can be inferred that these two expressions are (where  indicates Foreign variables): xi ¼ xi ¼

pe

δwL

(8a)

δw L

(8b)

N p1e þ N  ðτpÞ1e ðτpÞe

N ðτpÞ1e þ N  p1e

New Economic Geography After 30 Years

1229

Similar equations can be derived for the Foreign country. From the discussion following Eqs. (6) and (7) we know that output per firm is fixed and equal to x in equilibrium. Goods market clearing in each country for the increasing returns sector is provided for the Home country in Eq. (9a) and for the Foreign country in Eq. (9b). X  Nx ¼ X   N x ¼

N pe N p1e



þ N ðτpÞ

1e

N  ðτpÞe N p1e þ N  ðτpÞ

1e

δwL þ

δwLτ þ

N ðτpÞe N ðτpÞ

1e

þ

N  p1e

N  pe N ðτpÞ

1e

þ N  p1e

δw L τ

(9a)

δw L τ

(9b)

Note the additional τ multiplication terms in both expressions and also note that the output level in both countries – for individual firms – is x. In Eq. (9a) part of the Home exports to Foreign melts during transportation, but it needs to be produced before it can melt, and similarly in Eq. (9b) for exports from Foreign to Home, hence the additional multiplication by τ. Assume first that there are no transport costs with respect to the homogeneous sector H and second (as is standard in international trade theory) that labor is mobile between sectors but immobile between countries. It follows that wages in the H sectors in both countries are identical, and because of perfect inter-sector labor mobility, also in the increasing returns sector. Equation (6) allows us to choose units such that p = w = 1. This implies that we can simplify Eqs. (9a) and (9b) as follows (with Z  τ1  e, this term is known as the free  ness of trade ): x 1 Z ¼ L  Lþ δ N þN Z NZ þ N 

(9a)

x Z 1 ¼ L  Lþ δ N þN Z NZ þ N 

(9b)

We have two equations and two unknowns (N and N). In principle we have three possible cases (numbered a, b, c), namely: complete specialization in one of the two countries (cases a and b) and incomplete specialization (case c): 

Þ a. N = 0, N  ¼ δðLþL from Eq. (9b) x 

Þ b. N ¼ δðLþL , N = 0 from Eq. (9a) x 



Þ δðL ZLÞ  c. N ¼ δððLZL 1Z Þx , N ¼ ð1Z Þx from Eqs. (9a) and (9b).

Concentrating on the Home country we can distinguish between these three possibilities. If we introduce the following notation sl = L/(L + L) and sN = N/ (N + N), where sl is the labor share and sN the share of varieties or firms in Home, we arrive at:

1230

S. Brakman et al.

8 Z > > 0 if sl  > > ð 1 þ ZÞ > > < ðð1 þ Z Þs  Z Þ Z 1 l if < sl < sn ¼ > ð 1  Z Þ ð 1 þ Z Þ ð 1 þ ZÞ > > > 1 > > if sl  :1 ð1 þ Z Þ

(10)

The first entry in Eq. (10) follows from combining case a with case c (where specialization of all increasing returns production in Foreign just becomes binding). Similarly, the last entry follows from combining cases b and c. Finally, the middle entry follows from solving case number c. The implications are illustrated in Fig. 1. What we see is that if the Home country is large (small) enough in terms of labor relative to Foreign, it will attract (loose) all increasing returns manufactures. What is important in our discussion of the Home Market effect is the slope of the curve in the Z 1 area of incomplete specialization: ð1þZ Þ < sl < ð1þZ Þ. From Eq. (10) we know that the Þ slope of the line-piece is ðð1þZ 1Z Þ > 1, which implies that the larger country in this area has a more than proportional share of varieties and hence firms compared to its share in labor. The reasoning is as follows. Suppose that from the point (½, ½) a Foreign firm (together with its workers) relocates to the Home country that now becomes the larger market (the reason why this might take place is not important). This increases the market by the amount of workers that move, but it also increases the spending power of existing consumers who no longer have to incur transport costs resulting from importing the variety. This ‘double’ increase in demand raises profits in the

1

;

1/ 2

;

0 0

/ (1 +

)

1/ 2

1/ (1 +

)

1

Fig. 1 Home Market Effect; the volume effect. (Source: Authors calculation, based on Helpman and Krugman (1985)). Note: picture drawn for τ = 1.2 and e = 4, so Z = 0.579

New Economic Geography After 30 Years

1231

larger market, and attracts more firms to the increasing returns sector. Points on the solid line indicate that the increase in the number of firms must be more than proportional than the number of workers (some workers come from the homogeneous sector) in order to restore equilibrium. Why don’t all firms move to the larger market in order to restore equilibrium? The reason is that additional firms also introduce more competition that reduces the (potential) profits in the larger market. To explore the thought experiment of making the Home market larger, it is instructive to look at the denominator of Eq. (8a). A firm moving from Foreign to Home makes the denominator smaller (as the variety no longer has to be imported), and this implies more local competition. This competition effect is stronger, the higher are transport costs (high transport costs shield a market from Foreign competition). So fewer firms have to move to re-establish equilibrium following the movement of a firm from Foreign to Home if transport costs are high (the slope of the line gets closer to the diagonal line). To sum up, countries or regions with a relatively large demand for a good are host to a more than proportional share of production of that good. Against this Home market or market size effect, the competition effect acts to ensure that in equilibrium, and depending on the model’s parameters (notably on the level of transport cost index Z ), not all firms in the differentiated increasing returns to scale sector need to end up choosing the larger market as their location. From an empirical point of view, the model gives rise to a testable hypothesis with respect to international trade flows: countries with a relatively large Home market for variety i ceteris paribus are net exporters of this variety. In the trade literature (see for example Davis and Weinstein (2003)), this implication of the Home Market effect has been subjected to a series of tests (see below). Three other observations are relevant concerning the Home Market effect. The first one is that the effect is quite sensitive to the underlying assumptions. If international trade in the homogenous good is also subject to transport costs, the Home Market effect ceases to exist (Davis 1998). Also, the analysis of the Home Market effect quickly gets quite complicated (or even muddled) for the case of more than two regions or countries. The second observation is that in the example of Fig. 1, a large Home demand (here, a large sl) leads to an influx of firms where the necessary labor to enable the additional production has to be released from the homogenous sector. Given that international labor mobility is still low (as we will see this is the main difference between Krugman (1980) and Krugman (1991)), the additional demand for labor by the firms in the differentiated increasing returns to scale sector in Home does indeed fully materialize in higher production because of an infinitely elastic inter-sector labor supply. If labor supply is not perfectly elastic at least part of the response in the larger market will be in the form of higher wages (Fujita et al. 1999, Equation (4.42); Head and Mayer 2006). As we will see next, with a less than elastic labor supply, a relatively large demand or a larger Home market then translates (partly) into higher wages. A third and final observation is that demand across locations is given. This is a direct consequence of the fact that workers and hence consumers are immobile

1232

S. Brakman et al.

between locations. Any demand or market size differences are therefore exogenously given. What happens if one drops this assumption? What if not only firms but also (some) workers are mobile and can choose in which country or location they wish to live? Answering this question leads us to the center of New Economic Geography, but first we present another manifestation of the Home Market effect in terms of wages. The reason is that migration is determined by (real) wage differences between locations. So, we first need to derive an expression for (real) wages.

2.3

The Home Market Effect as a Factor Price Effect

In the example underlying Fig. 1, we (by construction) ignored any effect that market or demand size differences might have on wages. Labor was perfectly elastic between sectors but not between countries, which is the usual assumption in international trade theory. This enables us to focus on the number of varieties (firms). In Krugman (1991) an opposing case is introduced; the larger market does not attract more than a proportional share of firms, compared to its share in labor, but all benefits of a larger market now show up in terms of higher wages in the increasing returns sector. Actually such a wage effect can already be seen as an outcome of the Krugman (1980) model, we only have to change one assumption: labor is not only immobile between countries, but now also immobile between sectors. The implications are that we no longer have factor price equalization and that the number of varieties (firms) is proportional to the given quantity of labor in the increasing returns sector (so by assumption the Home Market effect of the previous section is absent). The set-up of the model remains the same, but we can no longer take the steps to simplify Eqs. (9a) and (9b) to Eqs. (9a’) and (9b’). At the same time it is true that location in the larger market offers benefits relative to location in the smaller market. Again, as in the previous section, location in the larger market implies that firms do not have to incur transport costs and that this increases the spending (real income) of consumers. How does it show up in this case? We can use Eq. (9a) to show this for the Home country (similarly for the Foreign country using Eq. (9b)). Note that as wages are not necessarily the same, prices also differ between countries. Furthermore, we have to be careful how to define income, Y and Y, in this case (see below). Taking care of these aspects results in: ðe  1Þα N pe N ðτpÞe ¼ δY þ δY  τ: β N p1e þ N  ðτp Þ1e N ðτpÞ1e þ N  ðp Þ1e

(11)

In Eq. (11) we have again used the fact that in the model mark-up pricing together with the zero profit condition fixes the break–even output of firms (see the left hand side of Eq. (11), and the discussion following Eqs. (6) and (7)). Using p = eβw/(e  1) and p = eβw/(e  1), we can rewrite Eq. (11) in terms of wages in the manufacturing sector (and do the same for the Foreign country):

New Economic Geography After 30 Years

w ¼ ρβρ



(12a)

1=e   e1 1=e δ Y P2 þ τ1e Y Pe1 (12b) w ¼ ρβ 1 ðe  1Þα  1e   1e  1e   1e ¼ N wρ þ N  τwρ and P1e ¼ N τw þ N  wρ are price 2 ρ 

where P1e 1

1=e  e1 1=e δ Y P1 þ τ1e Y  Pe1 2 ðe  1Þα

1233





indices and Y and Y are the income levels generated in Home and Foreign, respectively. These equations make sense in the following way. Wages in Home are higher if it has a large Home market in terms of real income YP1 or if it is located near a large Foreign market (large YP2 and low transport costs, or (equivalently a high free-ness of trade τ1  e). The benefits of a large market are not reflected in a more than proportional share of firms relative to the labor share, but in higher wages.

3

Adding Labor Mobility

3.1

Core New Economic Geography Model

It is now only a small step to make the model a full general equilibrium model that includes labor mobility (see also Brakman and Garretsen (2009)). The only thing to add is the possibility of labor migration between regions. This implies that a region’s market size becomes endogenous when migration is allowed to take place. In the 2-region setting of Krugman (1991) the equilibrium conditions of the model can be stated as follows: Y ¼ wL þ 0:5LH

(13a)

Y  ¼ w L þ 0:5LH

(13b)

w ¼ ρβρ w ¼ ρβρ



1=e  e1 1=e δ Y P1 þ τ1e Y  Pe1 2 ðe  1Þα 1=e   e1 1=e δ Y P2 þ τ1e Y Pe1 1 ðe  1Þα

(13c)



(13d)

¼N P1e 1

 1e   1e w τw þ N ρ ρ

(13e)

P1e ¼N 2

 1e   1e τw w þ N ρ ρ

(13f )

1234

S. Brakman et al.

ω¼

w w ; ω ¼ δ δ P1 P2

dL dL ¼   ¼ ηðω  ωÞ, with ω ¼ λω þ λ ω L L

(13g)

(13h)

The model clearly builds (and even largely overlaps) with the international trade model of Sect. 2, but also includes some new elements of which interregional labor mobility is the most relevant one. Equations (13a) and (13b) are the income equations in the two regions or countries, Home and Foreign. The first term on the right hand side indicates income earned in the increasing returns sectors that earn wages w and w in Home and Foreign, respectively. We assume that labor (in the increasing returns sector) is mobile between countries but not between sectors. The distribution of labor in the homogeneous (agricultural) sector is given and does not change. Total labor supply in this sector is LH and we assume – just for simplicity – that it is equally distributed over the two countries. There are no transport costs in this sector implying that wages earned in the homogeneous goods sector are equal in both regions, and we can use this sector as the numéraire sector, and wages in the increasing returns sector are relative to the wages in the homogeneous goods sector. It is important to note that we cannot do without this homogeneous goods sector in Krugman’s (1991) core New Economic Geography model. It implies that even when labor in the increasing returns sector is completely agglomerated by being located in just one of the two regions, there is always a positive (residual) demand in the other region, and firms might want to re-locate to this region in order to get away from the stiffer competition in the larger region. Equations (13c), (13d), (13e), and (13f) are already familiar from earlier sections. Equations (13g) and (13h) give the dynamics in the model. We define real income of a worker in the increasing returns to scale sector in Eq. (13g). It is simply wages divided by the price index of all the commodities consumed (including the homogenous good). As the increasing returns to scale sector comprises a share δ in the consumption basket, we want to correct for this in Eq. (13g). We also divide by the price in the homogeneous sector (raised to the power 1δ, the share of the homogeneous goods sector), but this does not show up because the homogeneous good is the numéraire good (and the price equals one). Equation (13h) states that labor in the increasing returns sectors moves to the region with the highest real wage (λ is the share of mobile workers in the Home country). Of course, in the real world migration decisions are based on more than just real wage differences. The model gets quite complicated because if labor moves to (say) the Home country, this changes incomes [Eqs. (13a) and (13b)] which affects nominal wages [Eqs. (13c) and (13d)], and also the prices indices [Eqs. (13e) and (13f), see Brakman et al. (2019), Chap. 7 for details on the price indices]. This subsequently affects the migration decision itself. These effects are non-linear. Given the key model parameters, most importantly the value of transport costs, the balance between the agglomeration forces (Home Market effect, Price Index effect) and the spreading forces (competition effect) we determine what the

New Economic Geography After 30 Years

1235

equilibrium spatial allocation will be. It turns out that the model has basically three (stable) equilibria: full agglomeration in Home or Foreign, and perfect spreading. Interestingly, the model is not only characterized by multiple equilibria but also by path dependency. Figure 2 illustrates the model. The so called Tomahawk diagram depicted in Fig. 2 shows that for low free-ness of trade τ1  e (=Z in the previous section), that is for high transport costs τ, footloose labor is evenly spread between the two regions but if the free-ness of trade gets high enough, that is if transport costs get low enough, all footloose workers end up in either region 1 or 2 in equilibrium. The solid lines indicate stable equilibria, the dashed lines indicate unstable equilibria. The arrows indicate in what direction the incentive for firms (and footloose labor) points, depending on the value of transportation costs. What are the forces that determine interregional migration? Three forces matter in the Krugman (1991) model: the Price Index effect, the Home Market effect, and the Extent of Competition effect. The Price Index effect stimulates agglomeration in the larger market as fewer varieties have to be imported and this saves on transport costs. This effect is magnified by the Home Market effect discussed above. If the Home Market effect results in higher wages (see Sect. 3.1) it makes the larger market more attractive. These agglomeration effects are counteracted and diminished by the Extent of Competition effect, which acts as a spreading force. If a firm moves to the larger market the denominators in Eqs. (9a) and (9b) become smaller, which reduces the demand for an individual firm. The more firms (and workers) there are in a region, the higher the level of competition will be. The balance between these three forces determines the direction of the arrows in Fig. 2. For low values of transport costs (high values of the free-ness of trade) the competition effect is felt less as the price difference between markets become

share of labor in Home

1

1 2

0

0

free-ness of trade basin of attraction spreading equilibrium basin of attraction agglomerationHome basin of attraction agglomerationForeign

stable equilibrium unstable equilibrium

Fig. 2 The Tomahawk for the Krugman (1991) model. (Source: Brakman et al. 2019)

1

1236

S. Brakman et al.

smaller. Note from Fig. 2 that there is not a gradual change from one stable equilibrium to another, but instead a catastrophic change; the moment the balance tilts between these forces it is either full agglomeration in one region or the other. Starting from an initial situation of a low free-ness of trade (left part of x-axis in Fig. 2), the point at which this happens is the so called break point B; moving from high to low transportation costs, spreading is no longer a stable equilibrium (breaks) if transport costs are reduced further. One could also start with very low transport costs (high free-ness of trade) and then subsequently increase transport costs (lower the free-ness of trade) until agglomeration becomes unstable. This happens at the so called sustain points S0 and S1 in Fig. 2. Note that in the middle part of Fig. 2 there is some overlap as to the range of the free-ness of trade for the agglomeration and spreading equilibrium which indicates that the model is characterized by path dependency, see Brakman et al. (2019, Chap. 7) for further discussion.

3.2

Model Extensions

The model described in Sect. 3.1 states the essence of the New Economic Geography model as introduced by Krugman (1991). In the subsequent literature many additions have been incorporated in this framework. These extensions are often motivated to correct some of the more unlikely aspects of the model or to make the model more tractable.

3.2.1 Intermediate Goods The model introduced above can be extended with an intermediate production sector. Assuming that labor is intra-regionally mobile, but inter-regionally immobile produces more realistic results than the extreme outcome as described in Fig. 2. Economic integration in this case, results in real wage convergence between regions rather than divergence. The reason for this is that the peripheral region becomes more attractive for manufacturing production as transport cost decline, because wage differences start to dominate transportation costs (which decline during economic integration). Most importantly, however, is the extension introduced by Puga (1999). The model introduced above predicts that small changes in the parameter values could result in sudden and dramatic changes (see the sustain and break points in Fig. 2) which seems unrealistic in practice. Puga (1999) extends the intermediate production model and assumes that the numeraire sector is no longer characterized by constant returns to scale, but instead by diminishing returns. This implies that pulling workers out of the homogeneous sector raises marginal productivity and nominal wages in this sector. This adds an additional spreading force into the model preventing a bang-bang solution as in the standard model. Puga (1999) shows that this additional force combined with the assumptions in Krugman and Venables (1995), and Venables (1996) are so strong that instead of the Tomahawk as depicted by Fig. 2, a bell-shaped curve appears, which suggests a more gradual change from full agglomeration to complete spreading. This aspect of the Puga (1999) model makes it a preferred model in empirical research (see below).

New Economic Geography After 30 Years

1237

3.2.2 Factors of Production Another extension is to allow for more factors of production. One can assume, for instance, that manufacturing production uses high-skilled and low-skilled labor in production. This makes the model more in touch with reality. A surprising side effect is that the model becomes more tractable than in the standard case. If high skilledlabor is used in the fixed part of production, (α in Eq. (3)), and low-skilled labor in variable-cost part of manufacturing production (β in Eq. (3)), but also in the production of the homogeneous sector, we no longer have to solve for nominal wages (which also can be normalized), but only for high-skilled wages. The solution is relatively straightforward. So, besides introducing the more factors of production we can now derive explicit solutions. This is not the only change to the basic model that results in analytical solutions. Ottaviano, Tabuchi, and Thisse (2002), for instance, drop the CES demand structure and introduce a quasi-linear demand structure. 3.2.3 Heterogeneous Firms The Melitz 2003revolution has also entered New Economic Geography. Firms in the standard models are all the same according to their cost structure, see Eq. (3). However, Baldwin and Okubo (2006) introduce productivity differences between firms. They show that firms line up for reallocation from a small to a large market in order of firm productivity levels; more productive firms can already relocate at transportation levels that would imply a loss for less productive firms. Models like these are important for empirical research as they point towards an empirical complication: do larger markets benefit from ‘agglomeration economies’ or from the fact that they are Home to the more productive firms?

4

Empirical Testing

Can we test the main implications of New Economic Geography? This seems a simple question, but it has turned out that this question is surprisingly difficult to answer. The model has interesting consequences, but a combined test of the main or all aspects is still missing. Head and Mayer (2004, p. 2616) already identified five main characteristics – slightly restated by us below and compressed into three main testable implications – that are special for New Economic Geography and could be tested to explain the facts implied by Figs. 1 and 2 (Brakman et al. 2019, Chap. 9): (a) The Home Market Effect: large regions will be host to a disproportional share of the imperfectly competitive industry. Such large markets are therefore, net exporters of industries characterized by increasing returns to scale. As we discussed at some length in Sect. 2, there are also two other possible testable implications from this effect: (a1) The volume version (recall Fig. 1): a large market potential induces factor inflows from the small to the large market. Footloose factors of production

1238

S. Brakman et al.

will be attracted to those markets that pay relatively high real factor rewards. This leads to a process of circular causality. (a2) The factor price version (Eqs. 12a and 12b): a large market potential raise local factor prices in the core relative to the periphery. An attractive market with a strong Home Market effect will increase demand for factors of production and this raises factor rewards. (b) Shock sensitivity: as we discussed with Fig. 2, changes in the economic environment can trigger drastic and permanent changes in the spatial distribution of economic activity. (c) At some critical level of transport or trade costs, and again see Fig. 2, a further reduction in transport costs induces agglomeration by relocation of the footloose factors of production. This implies that more economic integration should at some point lead to (more) agglomeration of the footloose activities and factors of production. Characteristics (a1) and (a2) describe the consequence for factors of production or factor prices once the Home Market effect is established. As explained in Fujita et al. (1999, p. 57), the equilibrium of the Krugman (1991) model implies the dw dL following equation: dY Y ¼ γ 1 w þ γ 2 L , where Y is total demand for the footloose sector, w is the nominal wage rate in this sector, and L is employment in this sector, while γ 1 and γ 2 are parameters. Note, that this equation also illustrates why the findings on the Home Market effect show a highly variable pattern of estimated coefficients: both wages and employment changes should be accounted for, not only employment changes as in the strict version of the Home Market effect. It shows that an increase in the demand Y for the goods from the footloose sector not only causes employment changes (the volume version of the Home Market effect), but also induces wage w changes (the factor price version of the Home Market effect). In a series of papers, Davis and Weinstein (2003) have developed an empirical methodology that enables them “to distinguish a world in which trade arises due to increasing returns as opposed to comparative advantage” (Davis and Weinstein, 2003, p. 3). In general they find some support for the volume version of the Home Market effect. As shown in Brakman, Garretsen and Schramm (2006), however, both effects are typically at work. On balance, it appears that the wage channel is the main route towards spatial equilibrium (Head and Mayer, 2006). This explains also why most empirical work has focused on the wage Eqs. (13c) and (13d). Despite the empirical evidence that supports the volume or factor price version of the Home Market effect, the question remains whether this evidence is a test of New Economic Geography as such; they are also a characteristic of standard trade models. We will return to this question at the end of the present section. One of the key elements of New Economic Geography is the shock sensitivity. As illustrated by Fig. 2, small changes in parameters (in casu, the level of transport costs) can (but need not) have big consequences. It implies for instance that a small change in economic integration could lead to spectacular changes in the spatial distribution of economic activity. If small changes already can have large effects, one

New Economic Geography After 30 Years

1239

would be inclined to think that permanent effects in the spatial distribution of economic activity can be found after large changes. The key issue is whether one can come up with real world examples of a large, temporary and exogenous shock that can act as a testing ground for the shock sensitivity hypothesis. In a seminal paper, Davis and Weinstein (2002) use the case of the allied bombing of Japanese cities during World War II (WW II) as an example of such a shock. Brakman, Garretsen, and Schramm (2004) apply the Davis and Weinstein (2002) approach to the case of the allied bombing of German cities during WW II. In both studies the question is the same: did individual cities return to their initial, pre-war growth path after WW II? The break-up of Germany in 1949 in the Federal Republic of Germany FRG (West Germany) and the German Democratic Republic GDR (East Germany) and the subsequent re-unification of the two Germanies in 1990 after the fall of the Berlin Wall is another example of a large, temporary (40 year) shock. Redding and Sturm (2008) use this shock to test whether West-German border cities (close to the FRG-GDR border) experienced a substantial decline compared to non-border cities in West Germany. The evidence of these studies is somewhat mixed. Davis and Weinstein (2002) do not find evidence of long term effects, whereas the other studies – on Germany – do find such effects. In general it seems that economies show some shock sensitivity. Again, and notwithstanding this evidence on shock sensitivity, the ultimate question for our present purposes is whether these studies do provide a real test of New Economic Geography as such. Note for instance that the New Economic Geography model as depicted by Fig. 2 allows for shocks that can have permanent and non-permanent effects. Also, it is clear that New Economic Geography is not the only location or spatial model to predict that shocks can alter the spatial equilibrium allocation of economic activity. Finally, New Economic Geography models predict that changes in transportation costs could result in changes in the degree of agglomeration through the relocation of the mobile factors of production. To this end, we essentially need the full model, as described in Sect. 3. The long-run equilibrium equation relates migration to real wage differences, which are determined in the model. For empirical research this is a challenging consequence of the model. First of all we have to find out where we are in terms of for instance Fig. 2. Is the economy that we are looking at initially to the left or to the right of the break point? This is important, because we like to know what happens if transportation costs change. In real world applications, however, we deal with a multi-region world and implicitly confront this multi-region model with break-points from the Tomahawk diagram in a 2-region setting, which is problematic. Similar analytical solutions for break-points in a multi-region setting (n > 2) only exist if all regions are at equal distance from each other. This assumption effectively means that the actual geography (where regions are located on a map) does not play a role, and thus that space is neutral in that sense. Any real world application clearly violates this assumption. How should we proceed to arrive at more conclusive evidence for our empirical hypothesis on transportation or trade cost induced agglomeration? One option is to drop the 2-region model, with its analytical solutions, and instead use multi-region model simulations in which key equations are based on multiregional estimates of

1240

S. Brakman et al.

Eqs. (13c) and (13d). In Bosker et al. (2010) this is the preferred option. They show that, in a qualitative sense, the multi-region non-neutral space model gives rise to the same conclusions as the simple 2-region version of the Puga (1999) model, that is, the results show that the 2-region model carries over in the multi-region case. Given this result, the answer to the question, “where on these curves are we?” can be answered in a simulation setting. Repeated simulations, using the estimated wage curves for different values of transportation costs, allow us to construct a multiregion version of Fig. 2. Confronting this curve with actual estimates for transportation costs, gives us an idea where on the curve we are, and in what direction the economy is moving; towards further agglomeration or further spreading. But it is clear that evidence based on simulations is not the same thing as evidence based on actual estimations, while using the structural equations of the model and using real data, of the third New Economic Geography hypothesis. How convincing is the empirical evidence for the New Economic Geography studies related to the three empirical hypotheses that were outlined at the beginning of this section? In addition to the comments made above for each of the hypotheses four general problems for empirical confirmation of the geographical economics model stand out (see also Redding (2010)): (i) Studies are not only consistent with geographical economics models but also with other theories of trade and location (the Home Market effect can, for instance, also be found in other trade models, see also discussion in Fingleton and Fischer (2010)); (ii) Applying two region New Economic Geography models like Krugman (1991) to a multi-region world makes conclusive testing difficult or even outright impossible; (iii) Causality: are the empirical observations caused by New Economic Geography forces or not? The empirical evidence indicates that wages (left hand side of the empirical specification of Eqs. (13c) and (13d)) are related to measures of market access. An important problem is whether this is a causal relation. Higher wages in regions with good market access may be caused by better institutions in surrounding regions or locational fundamentals instead of New Economic Geography forces, and the measures of market access might simply capture these more fundamental causes. This issue is more problematic for testing the Home Market effect (hypothesis 1) than for shock sensitivity tests (hypothesis 2, where cause and effect are more clearly distinguished); (iv) Using micro data: virtually all empirical New Economic Geography work is based on the representative firm and consumer framework and ignores the extensive micro data sets that have become available over the past years. Using these data (as in the urban economics literature), in combination with state of the art empirical methods like IV, diff-in-diff or a regression discontinuity design make it possible to determine if the agglomeration effects in the core are based on selection effects (truncation of the distribution) or agglomeration as such (rightward shift of the distribution), see Combes et al. (2008) or De La Roca and Puga (2017).

New Economic Geography After 30 Years

1241

A way forward is the development of the so called ‘quantitative spatial economics’ approach where the key ingredients from the New Economic Geography model are combined with the basic determinants of household and firm location choices from other approaches, notably urban economics, in a rather pragmatic way that lends itself more easily to empirical and policy applications (see Redding and RossiHansberg 2017). Another fruitful avenue for empirical (and policy) research us the deployment of multi-region versions of, at their very core, New Economic Geography type models to simulate or mimic the impact of large scale (infrastructure) investments or shocks, see for instance Donaldson and Hornbeck (2016) or Donaldson (2018).

5

Policy Consequences

The New Economic Geography framework is widely used to discuss policy implications of (local or national) interventions. This holds, for example, for many (regional) studies performed on behalf of the European Union and for the recent World Development Report 2009 by the World Bank (2008). A good summary of the six general policy conclusions based on the New Economic Geography model is provided by Ottaviano (2003) or Brakman et al. (2019, Chap. 11), for a more extensive treatment of these points see Baldwin et al. (2003): – Regional side effects. One of the fundamental insights from New Economic Geography is that regions are connected and cannot be studied in isolation. Regional policy measures that, for example, affect economic integration have consequences for all regions, not only the region at which the measure is aimed. Effects of regional policies are economy wide. – Trade interaction effects. Outcomes of the model depend crucially on initial levels of economic integration. A similar policy measure can have different effects, depending on the initial position of the economy (see Fig. 2). – Lock-in effects. Temporary measures can have permanent effects. Suppose that a temporary subsidy takes an economy over the break point in Fig. 2. It is then possible that a new long run stable equilibrium is reached. The economy will remain there even when the subsidy is ended. Also history matters. If an economy finds itself in a stable equilibrium strong policy measures might be needed in order to establish another equilibrium, in other words ‘history matters.’ – Selection effects. Figure 2 indicates that if transport costs are low, two stable equilibria are possible. Selecting one of the possibilities can have huge consequences from a welfare perspective. For example, the immobile workers in the core region benefit from being located in the core, but policy makers have to make up their mind that they indeed give welfare in the core region a greater weight in social welfare considerations than in the peripheral regions. – Threshold effects. Policies measures can seem ineffective. The reason is that measures should take an economy over the breakpoint in order to become effective.

1242

S. Brakman et al.

– Coordination effects. New Economic Geography models are characterized by multiple equilibria. Especially in the overlap area in Fig. 2 expectations about the future of the economy can be important. If policy makes can convince firms/ workers to relocate, this will start a self-sustaining move to a new equilibrium. A subsidy might take an economy towards a new equilibrium, but if policy makers can convince workers and firms that a specific region is the place-to-be, a subsidy is not required. The list suggests that a world characterized by New Economic Geography offers policy makers many attractive options. However, some qualifications are in order. First of all which New Economic Geography model describes the world best; the stylized model depicted in Fig. 2, or one of the models that are extensions of this Fig. 2 model? The model in Fig. 2 is to a large extent driven by a few parameters and it is highly unlikely that the real world can be described by only those few parameters, which are most likely different for all sorts of economies or periods (Combes 2011). Furthermore, in general core regions are always better off than peripheral regions. The Tomahawk Fig. 2 suggests that it is always possible, with the right measure, to pick a preferred equilibrium. Neary (2001) strongly argues against this ‘picking equilibria’ role for the government, because, as we concluded at the end of the previous section, the empirical evidence for New Economic Geography is still too weak (see above), and that such a policy would bear the risk of strategic, and wasteful, rent seeking behavior from competing regions. Still we think that one very strong policy measure results from New Economic Geography; regions are not free floating islands in space, but they are spatially interdependent. All too often regional policies are addressed to deal with a specific regional issue – like low wages, lack of employment, etc. – and deal with such a region as if it is an island in space. Policy measures can have unexpected results. An investment in, for example, the regional infra-structure in this peripheral region might not stimulate growth in the periphery, but instead might strengthen the position of the core region because economic integration further strengthens the position of the core (see Fig. 2). This is probably – in a qualitative sense – one of the most important policy conclusions that can be derived from New Economic Geography. Indeed, this basic notion underlies recent policy-oriented work in the so-called quantitative spatial economics approach where “New Economic Geography style” spatial interdependencies are one of the key drivers of locational choices of economic agents and where the impact of (policy induced) large structural changes on those choices is analyzed (see for instance, in a urban and intra-national, setting, Redding and Sturm (2016) or Heblich et al. (2018) and, in an inter-national setting, Desmet et al. (2018)).

6

Conclusions

We briefly discussed the structure of the New Economic Geography models to argue that the New aspect of this type of model is endogenizing economic size of a location. The “Economic” aspect of the name refers to the economic tools used,

New Economic Geography After 30 Years

1243

while the “Geography” part focuses on the crucial role of spatial interdependencies through transport and interaction costs. We then turn to the main empirical implications of New Economic Geography as summarized in a number of empirical characteristics. Despite the surge in empirical research in this area e a number of crucial problems with empirically testing the New Economic Geography models remains. We list four of these, namely (i) some effects can also be explained by other models in spatial economics at large, (ii) most tests in a multi-region world are only loosely based on a two-region basic model, (iii) despite the rise of the application of more sophisticated empirical methods in recent years causality problems are still rarely adequately addressed, and (iv) we need to integrate locational phenomena at different scales by continuing the trend to use more micro data. In view of our discussion of these shortcomings, the main policy implications (as discussed in Sect. 5) are still mostly qualitative, thus lacking a solid quantitative basis in most applied work. Here our money is on the latest incarnation of New Economic Geography, quantitative spatial economics (Redding and Rossi-Hansberg 2017), to take over the mantle as the most promising approach in mainstream economics to analyze the causes and consequences of location choices in a setting where spatial linkages matter.

7

Cross-References

▶ Classical Contributions: Von Thünen and Weber ▶ Factor Mobility and Migration Models ▶ Interregional Trade: Models and Analyses ▶ New Economic Geography and the City ▶ Path Dependence and the Spatial Economy: Putting History in Its Place ▶ Schools of Thought on Economic Geography, Institutions, and Development Acknowledgements This chapter is partially based on earlier work by the authors (corresponding Author: Charles van Marrewijk). We do not give detailed references to our own work but readers interested in further details can consult our book on urban and geographical economics (Brakman et al. 2019) for more extensive and detailed discussions and references, see in particular chapters 7, 8, and 9.

References Baldwin R, Okubo T (2006) Heterogeneous firms, agglomeration and economic geography: spatial selection and sorting. J Econ Geogr 6(3):323–346 Bosker M, Brakman S, Garretsen H, Schramm M (2010) Adding geography to the new economic geography; bridging the gap between theory and empirics. J Econ Geogr 10(6):793–823 Brakman S, Garretsen H (2009) Trade and geography: Paul Krugman and the 2008 Nobel prize in economics. Spat Econ Anal 4(1):5–23 Brakman S, Heijdra BJ (eds) (2004) The monopolistic competition revolution in retrospect. Cambridge University Press, Cambridge

1244

S. Brakman et al.

Brakman S, Garretsen H, Schramm M (2004) The spatial distribution of wages: estimating the Helpman-Hanson model for Germany. J Reg Sci 44(3):437–466 Brakman S, Garretsen H, Schramm M (2006) Putting new economic geography to the test: free-ness of trade and agglomeration in the EU regions. Reg Sci Urban Econ 36(5):613–636 Brakman S, Garretsen H, Van Marrewijk C (2019) A spiky world: an introduction to urban and geographical economics. Cambridge University Press, Cambridge Combes P-P (2011) The empirics of economic geography: how to draw policy implications? Rev World Econ 147(3):567–592 Combes P-P, Duranton G, Gobillon L (2008) Spatial wage disparities: sorting matters! J Urban Econ 63(2):723–742 Davis DR (1998) The Home Market, Trade and Industrial Structure. Am Econ Rev 88:1264–1277 Davis DR, Weinstein DE (2002) Bones, bombs and breakpoints: the geography of economic activity. Am Econ Rev 92(5):1269–1289 Davis DR, Weinstein DE (2003) Market access. Economic geography and comparative advantage: an empirical assessment. J Int Econ 59(1):1–23 De La Roca J, Puga D (2017) Learning by working in big cities. Rev Econ Stud 84:106–142 Desmet K, Nagy DK, Rossi-Hansberg E (2018) The geography of development. J Polit Econ 126:903–983 Dixit A, Norman V (1980) Theory of international trade. Cambridge University Press, Cambridge, UK Dixit A, Stiglitz J (1977) Monopolistic competition and optimum product diversity. Am Econ Rev 67(3):297–308 Donaldson D (2018) Railroads of the Raj: estimating the impact of transportation infrastructure. Am Econ Rev 108(4–5):899–934 Donaldson D, Hornbeck R (2016) Railroads and American economic growth: a “market access” approach. Q J Econ 131(2):799–858 Feenstra RC (2016) Advanced international trade: theory and evidence, 2nd edn. Princeton University Press, Princeton Fingleton B, Fischer MM (2010) Neoclassical theory versus new economic geography: competing explanations of cross-regional variations in economic development. Ann Reg Sci 44(3):467–491 Fingleton B, McCann P (2007) Sinking the iceberg? On the treatment of transport costs in new economic geography. In: Fingleton B (ed) New directions in economic geography. Edward Elgar, Cheltenham, pp 168–204 Fujita M, Krugman PR, Venables AJ (1999) The spatial economy; cities, regions, and international trade. MIT Press, Cambridge Glaeser E (2008) Cities, agglomeration, and spatial equilibrium. Oxford University Press, Oxford, UK Grubel HG, Lloyd P (1975) Intra-industry trade: the theory and measurement of international trade in differentiated products. Macmillan, London Head K, Mayer T (2006) Regional wage and employment responses to market potential in the EU. Reg Sci Urban Econ 36(5):573–594 Heblich S, Redding SJ, Sturm DM (2018) The making of the modern metropolis: evidence from London. CEPR discussion paper 13170, London Helpman E, Krugman P (1985) Market structure and foreign trade: increasing returns, imperfect competition, and the international economy. MIT Press, Cambridge, MA Henderson JV (1974) The sizes and types of cities. Am Econ Rev 64:640–656 Krugman P (1979) Increasing returns, monopolistic competition and international trade. J Int Econ 9(4):469–479 Krugman P (1980) Scale economies, product differentiation, and the pattern of trade. Am Econ Rev 70(5):950–959 Krugman PR, Venables AJ (1995) Globalization and the inequality of nations. Q J Econ 110:857–880 Krugman P (1991) Increasing returns and economic geography. J Polit Econ 99:483–499

New Economic Geography After 30 Years

1245

Melitz MJ (2003) The impact of trade on intra-industry reallocation and aggregate industry productivity. Econometrica 71(6):1695–1725 Ottaviano GIP (2003) Regional policy in the global economy: insights from the new economic geography. Reg Stud 37(6–7):665–673 Puga D (1999) The rise and fall of regional inequalities. Eur Econ Rev 43(2):303–334 Redding SJ (2010) The empirics of new economic geography. J Reg Sci 50(1):297–311 Redding S, Rossi-Hansberg E (2017) Quantitative spatial economics. Ann Rev Econ 9(1):21–58 Redding SJ, Sturm DM (2008) The costs of remoteness: evidence from German division and reunification. Am Econ Rev 98(5):1766–1797 Redding SJ, Sturm DM (2016) Estimating neighborhood effects: evidence from war-time destruction in London, mimeo, see http://www.princeton.edu/~reddings/papers/LWW2-9Mar16.pdf World Bank (2008) World development report 2009. World Bank, Washington, DC Venables AJ (1996) Equilibrium Locations of Vertically Linked Industries. Int Econ Rev 37:341–359

Further Reading Baldwin R, Forslid R, Martin P, Ottaviano GIP, Robert-Nicoud F (2003) Economic geography and public policy. Princeton University Press, Princeton Brakman S, Garretsen H, Van Marrewijk C (2019) A spiky world: an introduction to urban and geographical economics. Cambridge University Press, Cambridge Combes P-P, Mayer T, Thisse J-F (2008) Economic geography. Princeton University Press, Princeton Fujita M, Krugman PR, Venables AJ (1999) The spatial economy; cities, regions, and international trade. MIT Press, Cambridge, MA Head K, Mayer T (2004) The empirics of agglomeration and trade. In: Henderson V, Thisse J-F (eds) Handbook of regional and urban economics, vol IV. North Holland, Amsterdam, pp 2609–2665 Neary JP (2001) Of hype and hyperbolas: introducing the new economic geography. J Econ Lit 39(2):536–561

Relational and Evolutionary Economic Geography Harald Bathelt and Pengfei Li

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Segmented Cluster Paradigms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Network Relations and the Knowledge-Based Conception of Clusters . . . . . . . . . . . . . . . . . . 4 Regional Path Dependence and Cluster Life Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Toward an Integrated Relational-Evolutionary Model of Cluster Dynamics . . . . . . . . . . . . . 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1248 1250 1251 1252 1255 1260 1262 1262

Abstract

Since the 2000s, economic geography has encountered increasing interest and debates about evolutionary and relational thinking in regional development. Rather than opposing the two approaches, this chapter investigates how they can complement one another and be applied to specific research fields in economic geography. A comparison would be difficult because the approaches are situated at and address different levels of the research process. To demonstrate the potential of combining the two approaches, this chapter aims to conceptualize cluster dynamics in an integrated relational-evolutionary perspective. In recent years, research on clusters has experienced a paradigmatic shift from understanding their network structure to analyzing dynamic changes. Within this context, inspired by relational and evolutionary thinking, a comprehensive tripolar analytical framework of cluster evolution is developed that combines the three H. Bathelt (*) Department of Political Science and Department of Geography and Planning, University of Toronto, Toronto, ON, Canada e-mail: [email protected] P. Li Department of International Business, HEC Montréal, Montréal, QC, Canada e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_37

1247

1248

H. Bathelt and P. Li

concepts of context, network, and action, allowing each to evolve in interrelation with the others. Through this, the chapter argues that, rather than viewing relational and evolutionary accounts as competitive approaches to economic geography, they can, in an integrated form, become fundamental guides to economic geography research. Keywords

Evolutionary approach · Economic geography · Path dependence · Local agent · Cluster dynamic

1

Introduction

After vivid conceptual debate in economic geography in the 2000s, two approaches have received substantial attention in the academic community that will be discussed in this chapter: that is, relational and evolutionary perspectives. While some scholars compare both perspectives as competing conceptualizations (e.g., Hassink and Klaerding 2009), we believe that such a comparison is not easily possible. There are two reasons for this: First, relational economic geography is a broad term which encompasses a number of approaches that relate to different research traditions stretching out from critical realist and poststructuralist to actornetwork theorizations – which makes it difficult to critique this work as a homogenous body of research – while evolutionary economic geography has more narrowly developed out of evolutionary economics. Second, relational perspectives address meta-theoretical aspects of how to position analyses in economic geography, which questions to ask and how to conceptualize specific problems. In contrast, evolutionary approaches are situated at the concept level and often involve a specific quantitative methodology to analyze a problem. Both approaches are also still in a stage of development. Relational perspectives were designed as a multidisciplinary alternative to narrow regional science approaches which were primarily based on conventional neoclassical economics. Such work aimed to explain economic landscapes by introducing spatial variables into economic models. Although conventional analyses did not always strictly follow this line of thinking, much of the work was characterized by a meso /macro-perspective, the treatment of spatial entities as if they were actors (while neglecting the real actors, i.e., individuals, firms, and other organizations), a neglect of wider social relations, and a lack of process analysis. Relational approaches instead view economic action as social practice, conduct a microlevel reasoning, investigate the way how institutions stabilize economic relations, explore social and economic processes, and analyze the effects of global production and the connection between local and global scales (e.g., Boggs and Rantisi 2003; Faulconbridge 2017). In the conceptualization of Bathelt and Glückler (2011, 2017), which provides the reference point for the arguments presented here, relational economic geography is a

Relational and Evolutionary Economic Geography

1249

meta-conceptualization for formulating research questions and conducting research in economic geography. This conceptualization provides a bottom-up logic of how economic action unfolds in a spatial perspective and leads to wider spatial patterns that can differ from place to place. This includes a structural and an evolutionary component: The structural component refers to the role of context. Accordingly, economic agents are situated in structures of social relations from which they cannot easily separate (Granovetter 1985). Firms in clusters and global value chains are, for instance, embedded in networks of knowledge flows and supplier-producer-user relations which are key when making decisions about product changes. The evolutionary component refers to the fact that economic action is path dependent. Past decisions and traditions of social relations provide preconditions for today’s actions and thus impact contemporary decision-making. At the same time, the relational approach rejects the assumption that such patterns can be extrapolated to the future. Economic action is seen as fundamentally open-ended and contingent, since agents are free to deviate from preexisting structures and development paths. From this, it is suggested that the complex underlying structures of organization, interaction, innovation, and evolution in a spatial perspective are at the core of enquiries in economic geography. Similar to relational perspectives, evolutionary approaches are based on a critique of conventional research that is mostly static. In contrast, evolutionary approaches to economic geography aim to analyze dynamic changes in economic landscapes, often based on conceptualizations from evolutionary economics (Martin and Sunley 2006; Kogler 2015). Focusing at the regional (or national) level, much of the work analyzes the effects of processes of selection, mutation, variation, and chance on the development of firm populations. Within this context, related work investigates processes of establishing regional variety and selecting alternatives from this variety. The idea behind this is that “related variety” between local/regional industry sectors enables spillover processes, supports innovation, and produces regional advantage (Frenken et al. 2007). Over time, selection processes lead to specific regional development paths. While research on the establishment of new trajectories is still at an early stage, another stream of the literature analyzes path-dependent regional development and potential lock-in processes (Grabher 1993; Henning et al. 2013). Although offering new insights for studies in economic geography, both perspectives have shortcomings: Much of the empirical work using a relational framework aims at understanding why specific economic networks exist, what the nature of social relations is, and why this differs from place to place, while neglecting the dynamics of such structures. Vice versa, evolutionary approaches focus on regional economic dynamics and the identification of trajectories at a meso /macro-level, while neglecting the underlying structures of socioeconomic relations. In fact, although evolutionary approaches are often based on a firm perspective, the actual analysis addresses aggregates, such as regional structures and developments, and derives general statements about, for instance, the persistence of regional distributions. These differences are illustrated further in the next section which directs attention to industrial clusters as the unit of analysis.

1250

2

H. Bathelt and P. Li

Segmented Cluster Paradigms

Arguably, empirical and conceptual analyses of industrial agglomerations and clusters have been at the core of much of the work in economic geography over the past three decades. Within this context, relational and evolutionary approaches have developed in two successive stages of the discussions of industrial clusters: Initially, academic interest was attracted by the robust growth of certain industrial districts, clusters, or regional innovation systems. (This early stage of cluster research is, in fact, only partially relational since much of this work lacks dynamic components. This may explain why some scholars interpret relational approaches as static (Martin and Sunley 2006; Hassink and Klaerding 2009).) A consensus about the structure of these competitive regions was that they combine economic activities and culture at the local level through untraded (aside from traded) linkages, echoing with the social-embeddedness argument in economic sociology (Granovetter 1985). It was argued that networks of local agents, which are often associated with mutual trust, provide a third way of governing economic relations beyond the dual structure of market and hierarchy. They generate regional prototypes of tacit knowledge where new knowledge is constantly being created and successfully shared (Malmberg and Maskell 2002). In other research, due to changes in regional configurations, such as the Third Italy (Hadjimichalis 2006) and Silicon Valley (Saxenian 2006), a transition has taken place from a static to a more evolutionary view of clusters. This has given rise to a new evolutionary approach on clusters that focuses on dynamic changes, drawing inspirations from concepts such as path-dependency, lock-in, and industry life cycles (Frenken et al. 2007; Henning et al. 2013). Until now, the two dimensions - network and evolution - have remained relatively unconnected in the literature on industrial clusters. In both perspectives, broader levels of change in social networks and local culture and their impact on cluster evolution have rarely been discussed. This chapter argues that a close link between network dynamics and cluster evolution needs to be established to develop a coherent conceptualization of clusters. Without changes in networks and conventions, regional renaissance would barely be a “flash in the pan,” induced by temporary increases in demand. Signs of recovery would not lead to a succession of the evolutionary path or life cycle of a cluster. Without an evolutionary perspective, regional success would be determined by the existing local manufacturing culture (Gertler 2004), which would similarly provide a partial understanding. To bridge the gap between narrow relational and evolutionary perspectives in cluster research, this chapter formulates a tripolar framework for the analysis of cluster dynamics through contextualized theoretical construction (Li et al. 2012). The tripolar framework builds on the pillars of context, network, and action, integrating them in an organic way at the local/regional level. This is not an attempt to establish a global model of cluster dynamics. Rather, by contextualizing social networks, we emphasize the possibility of network dynamics and, hence, varied effects of networks on local agency over time. As such, contextualized networks help explain and understand deeper transformations inside clusters. Furthermore, by placing networks in dynamic context-action configurations, we indicate how new

Relational and Evolutionary Economic Geography

1251

cluster paths can be created through structuration processes that are initiated by local agents (Giddens 1984). Following this agenda, this chapter is structured as follows: The next section discusses relational cluster conceptions that focus on the network paradigm, drawing particularly from the knowledge-based buzz-and-pipeline model. Then, we present an overview of evolutionary cluster conceptions. From a critique of both types of approaches, we develop a reconceptualization of cluster dynamics in an integrated relational-evolutionary way, before presenting concluding remarks.

3

Network Relations and the Knowledge-Based Conception of Clusters

Traditionally, work on industrial agglomerations or regional industry clusters has emphasized the role of cost advantages, especially low transportation and transaction costs and close material linkages within such settings. Krugman (1991), for instance, stressed the importance of cost incentives for suppliers to locate close to an existing industrial agglomeration and advantages of agglomeration from a labor market perspective. As contributions by Storper and Salais (1997), Malmberg and Maskell (2002), and others have emphasized, however, it is necessary to go beyond cost factors to more fully understand the processes underlying regional specialization and concentration. In drawing on “localized capabilities” and “untraded interdependencies,” broader conceptualizations of regional clusters acknowledge the importance of socio-institutional settings, interfirm knowledge flows and interactive learning in regional innovation and growth. From this understanding, a research tradition has developed that stresses the importance of network linkages and producer-user relations in clusters. Focusing on local interfirm linkages, Malmberg and Maskell (2002) emphasize the vertical and horizontal dimensions and relationships in clusters. While the former relationships refer to firms that are linked through input–output linkages and value-chain-based relations, the latter relate to firms that produce similar products and compete against one another. They learn by monitoring and comparing themselves with other firms. Although Malmberg and Maskell (2002) point out that, in order to establish a theory of clusters, it would be necessary to understand which factors support the continued growth of clusters and how they are reproduced, much of the existing work has not developed a dynamic or evolutionary perspective. This is also reflected in the buzz-and-pipeline model of clusters (Bathelt et al. 2004), which suggests that the growth of a cluster depends on systematic linkages between its internal networks, conceptualized as “local buzz,” and its external knowledge and market environment, referred to as “global pipelines.” Within the cluster, specific information about technologies, markets, and strategies is exchanged in a variety of ways in planned and unplanned meetings. Based on a shared institutional background, firms learn how to interpret local buzz and make good use of it. Participation in this buzz does not require specific investments, since the firms are surrounded by a tight web of

1252

H. Bathelt and P. Li

opinions, recommendations, judgments, and interpretations (Storper and Venables 2004). While local buzz supports internal coherence, a cluster’s competitive success and growth strongly depends on its external linkages (Owen-Smith and Powell 2004). Since access to global or trans-local markets and knowledge is not free, considerable search efforts have to be undertaken to find the right partners – a process that entails high investments and uncertainties. External relationships also require building trust, which is a timely and costly process. The buzz-and-pipeline model suggests that the local information and knowledge ecology is of only limited effect in the absence of trans-local connections (Bathelt et al. 2004, 2018). The more strongly the actors in a cluster are involved in establishing and maintaining external partnerships, the more information about new markets and technologies is pumped into the cluster’s networks (Fitjar and Rodríguez-Pose 2011). Without this influx of external knowledge, there is a danger that firms miss out new opportunities or pin their hopes on the wrong technologies. Vice versa, without local buzz, the cluster’s external pipelines are also of little use. Local buzz enables firms to rapidly filter out from the mass of external information those elements that are particularly important for the development of technologies (Bathelt and Glückler 2011). Although related cluster approaches often draw on dynamic concepts, such as growth and reproductivity, they are mostly static in character. Such approaches focus on network aspects and do not conceptualize the genesis and evolution of clusters (Maskell and Malmberg 2007).

4

Regional Path Dependence and Cluster Life Cycles

The growing interest in cluster dynamics originates from the failure of conventional static models in explaining local crises and structural changes. A dilemma of such research is that localized benefits are expected to happen once a cluster exists. The question of how cluster structures emerge in the first place is neglected in this work or viewed as an “individualistic” process. Since the factors that support a cluster’s genesis may differ from those that support its ongoing growth (Bresnahan et al. 2001), a systematic conceptualization of clusters requires a dynamic component. One strand of the literature on cluster dynamics focuses on the concepts of path dependence and lock-in related to evolutionary theories. A conceptual challenge when applying metaphors from evolutionary economics or evolutionary biology to economic geography is, of course, to justify the transferability of path-dependence explanations – originating from microlevel analysis of organizational behavior – to the aggregate local/regional level. A natural way of justifying the use of evolutionary ideas at the local level is to demonstrate that geography matters in the realization of path-dependent processes. Such processes – be it related to technological lock-in, externalities, or institutional inertia – do not occur in a spaceless world. The idea that a firm’s interactive learning processes, strategic choices, and organizational routines are shaped by the local cultural and institutional environment has been repeatedly pointed out in the network tradition of cluster research. In this view, path dependence

Relational and Evolutionary Economic Geography

1253

is associated with a place-dependent evolutionary process (Martin and Sunley 2006). Various empirical studies add to this argument by illustrating that regional path dependence can persist over a very long time period (Grabher 1993; Saxenian 2006; Henning 2019). Further theoretical exploration of evolutionary thinking in economic geography goes beyond preliminary claims of regional stability. History matters but does not determine future trajectories of clusters (Bathelt and Glückler 2011). Related to this, it appears that path dependence overaccentuates the continuity and stability of regional developments while discontinuities and structural crises, which are equally if not more important, are rarely conceptualized. To conceptualize structural change and path dependence in a consistent manner requires a different interpretation of clusters. The traditional path-dependence model treats clusters as homogenous entities that form the unit of analysis. Even though social relations of firms are acknowledged in the localization process of a new path, the central focus is the overall intensity of local networks rather than their internal structure, let alone changes in the network structure of social relations. Since the position in a network impacts the kind of knowledge an agent can receive, a diversified set of agents is likely associated with more diversified structures of local knowledge flows and networks. Therefore, new path creation of clusters an outcome driven by the interaction between agents is more likely to occur in regions with varied structures (Frenken et al. 2007; Boschma and Iammarino 2009; Kogler 2015). The focus on the heterogeneous and diverse nature of regions revitalizes evolutionary thinking in economic geography. In a transformational context, therefore, re-bundling processes in a region without diversified networks may lead to the development of hollow instead of renewed clusters (Bathelt et al. 2004). By viewing regions as composite systems and drawing inspirations from evolutionary ideas in political science, Martin (2010) puts forward an alternative model of path dependence to highlight dynamic path processes. By inspecting interactions of agents in different network positions, Sydow et al. (2010) make an effort to combine conceptions of path dependence with Giddens’ (1984) structuration theory, thus trying to disentangle the underlying agency processes of new path creation in clusters. The tripolar framework, developed in the next section, draws from a similar conceptualization. Although contributions to new path creation complement interpretations of clusters as quasi-permanent structures, evolutionary perspectives are, thus far, still limited by a relatively narrow focus of analysis. In views of path dependence, singular interpretations dominate, whereby cluster dynamics are restricted both theoretically and empirically to industry, technology, or institutional structures. In contrast, aspects of the coevolution of interrelated economic, technological, institutional, and sociocultural arenas “a key issue for further research” (Martin and Sunley 2006, 413) are remarkably under-conceptualized. Singular views of evolutionary dynamics are also strong in regional analyses. In conventional studies, the evolution of clusters has primarily been explained at the local level, leading MacKinnon et al. (2010) to criticize evolutionary economic geography as neglecting social structure, labor relations, and capital accumulation at a broader macro-level

1254

H. Bathelt and P. Li

(Henning 2019). In globalized competition, especially in capital-intensive industries, influences at the national and international scale are indispensable to understand evolutionary processes. To go beyond a regional theorization of cluster evolution is also propelled by international technical communities that promote cooperation and competition between clusters (Saxenian 2006). Other conceptualizations of cluster evolution draw on industry or product lifecycle theories (Klepper 1997). Industry life-cycle theories suggest that a dominant product design does not exist during the early stages of industrial development and that new technologies only flourish in selected areas. With increasing maturity, markets become more stable, knowledge gets codified, and dominant technologies emerge. Since communication of tacit knowledge and technological innovation are key features of innovative clusters, a natural corollary of life-cycle theories is that innovative clusters most likely develop in an early rather than a mature stage of industrial development (Iammarino and McCann 2006). The point of this argument is that the evolution of clusters corresponds with and follows from the technological paradigm of those industries that form their bases. Cluster life cycles are often a regional version of industry life cycles. Strong emphasis on the role of technology in cluster dynamics in these approaches, however, adds an element of determinism to the explanation. When using technological paradigms to explain the rise or failure of industrial clusters, there is a danger of a posteriori reductionist reasoning. It is easy to explain technological changes of an industry when looking back, but difficult to foretell what will happen in the future. Accordingly, only after clusters succeed or fail can the rationality of a technology regime be fully understood. In practice, however, a change in technology is not an external factor that determines the cluster’s evolution but the outcome of the interconnected nature of the firms’ choices and actions along the dynamics of cluster development. Cluster-, industry-, and product life-cycle theories predict technological change in a deterministic manner rather than explaining the origins of technologies as resulting, for instance, from cluster innovation. Other cycle conceptualizations are different in that they presume that cluster life cycles are existent rather than constructed. Related research pays attention to uncovering the characteristic forces at each stage of a cluster cycle. Accordingly, different forces have been identified, through which clusters move from one stage to another (Maskell and Malmberg 2007; Menzel and Fornahl 2009). Early discussions of this type of cluster cycles implied that clusters experience a unidirectional stage-to-stage development. To free clusters from such deterministic reasoning, Menzel and Fornahl (2009) add feedback loops allowing clusters to jump back to earlier stages during the sustainability or decline stages. However, such relaxation of stage rigidity only alleviates the mechanical characteristics of cluster cycles. It is problematic to assume that a single life cycle could cover the diverse trajectories of clusters in different real-world contexts. Even Martin and Sunley’s (2011) attempt to conceptualize cluster dynamics as an adaptive-cycle model drawing on evolutionary ecology does not fully overcome the idea of a “natural” development trajectory. They suggest that cluster evolution proceeds through different stages that can lead to continuous cyclicity but also get stuck in

Relational and Evolutionary Economic Geography

1255

stages of ongoing adaptation, stabilization, reorientation/renewal or decline and disappearance of existing clusters. Although evolutionary and life-cycle conceptualizations of cluster dynamics draw from different origins, they share two aspects in their theoretical construction. First, in both discussions cluster dynamics are typically conceptualized from model assumptions rather than derived from within their regional or national contexts. Forces driving cluster evolution are, in many models, situated at aggregate levels beyond the individual agent. Such a way of conceptualization risks overabstraction, potentially losing sight of interesting insights happening “on the ground.” Studies may thus dismiss the diversity of trajectories of cluster development. Second, there is an inclination to disregard changes in the underlying social structure. In cluster life cycles, different stages are mainly distinguished by observable indicators, such as firm size and number of employees, or by indicators that are less easily measurable, such as technology and diversity of local knowledge pools. In terms of the institutional dimension, it is institutional inertia rather than reforms of institutions that are captured by lock-in processes. Local business culture and social networks, which have been extensively discussed in the network paradigm of clusters, are deliberately excluded in these theoretical frameworks of cluster dynamics. It is suggested that path-dependent evolution leads to long-term stability in or irreversibility of spatial industry patterns (e.g., Boschma and Iammarino 2009). The problem of this view is that the seeming stability of aggregate patterns hides changes in the social structure and network relations underlying these meso-macro patterns. As significant as evolutionary and life-cycle conceptualizations may be, their concentration on normative descriptions of cluster dynamics draws away attention from the analytical concerns of cluster theories. In a different approach, the theorization developed in the next section aims to frame the relationships of those forces enabling and shaping diverse trajectories of clusters. Based on observations of the economic agents’ behavior at the local level, as well as beyond, this conceptualization aims to extract key influences of cluster dynamics in the long run. Instead of asking how clusters will evolve, our analytical framework gives priority to the question of why clusters change. This does not presume the existence of a general theory that extracts the critical forces behind the dynamics of clusters.

5

Toward an Integrated Relational-Evolutionary Model of Cluster Dynamics

Any conceptualization of clusters presumes an interpretation of what a cluster is. In our view, a cluster is neither an organism, which can grow and decline per se, nor an entity that can be described by a single rationality or technology. In the tripolar framework, a cluster is a group of agents and firms that are bound together geographically, technologically, and relationally. In this vein, trajectories of clusters are aggregate – planned as well as unanticipated – outcomes of the individual choices and actions of local agents, as well as the synergies that derive from them. Analytical frameworks of cluster dynamics need to be formulated in relation to the actions and

1256

H. Bathelt and P. Li

Fig. 1 A tripolar analytical framework of cluster evolution. (Source: Li et al. (2012, p. 133))

Context

Action

Network

motivations of local agents. From the contingent, relational, and accumulated characteristics of the local agent’s behaviors (Bathelt and Glückler 2011, 2017), three important pillars are identified as central analytical categories in the tripolar framework. These are context, network, and action, bound together in a reflexive manner that stimulates an evolutionary dynamic (Fig. 1). Context. Actions of local agents are contingent, which makes it hard to predict such actions. Contingency is directly related to the first pillar of our framework: the specific context in which actors are situated. By context, we mean the economic and institutional structures influencing local actors in the process of making and fulfilling decisions. This influence also includes the results of previous actions of other agents. The economic structure of clusters involves industry and market characteristics, technological patterns, intra-firm organization, and the dominant interfirm linkages inside and beyond the region. The institutional structure, in turn, refers to the local and nonlocal political regimes, routines, conventions, and value and belief systems. The economic and institutional settings, which are structured by the division of labor and the geographical distance between activities (Storper 2009), influence the local actors’ knowledge base and their interaction. When applied to cluster evolution, the economic and institutional dimensions of context are often blurred since long-term interfirm connections can form powerful interest groups, stabilizing the local institutional context (Grabher 1993). Context both constrains and enables action in clusters. From a psychological and pragmatic perspective, Storper (2009, 13) proposes an informational interpretation of context, the structural component of which “is defined by the division of labor in which the actor finds himself, which has a decisive influence on the information

Relational and Evolutionary Economic Geography

1257

environment for the individual, hence his ‘input’ structure of cues and reference points.” In this sense, context has an impact on the ways how actors find and apply information and knowledge, leading them to choose certain actions over others. The relationship between context and action is neither predetermined nor normative. A specific context does not determine what actors do but limits ways of coordinating actions in a given situation. In other words, there are different frameworks of action in possible worlds of production, yet, in a certain context, some coordinated collective actions are more likely than others (Storper and Salais 1997). The effects of context on performance are not predetermined as they can be positive or negative (Storper 2009). On the negative side, the practical environment of actions restricts what kind of knowledge local actors may receive. On the positive side, context enables what agents in clusters can do by creating a bias toward certain kinds of knowledge. Therefore, for local agents, the question is not how to escape restricted contexts and/or enter more beneficial environments, but how to reflexively interpret practical situations and make appropriate adjustments. Context becomes an important influence once it has been internalized into the actors’ motivations and behavior. In sum, the constitution of context reflects the duality of structure, both as a medium and an outcome of the agents’ practices (Giddens 1984). From a structural perspective, actions are structured by contexts. At a particular time for a specific local actor, context is a given constraint. Over a long time span, however, contexts are constructed by actions and are thus variable. Routines and conventions of doing business are formed based on foreseeable expectations about the mutual behavior of others as an outcome of recursive interaction. Ongoing interfirm relations are consolidated through the successes of series of transactions. Competitive patterns of industries are shaped by the choices and practices of actors in comparison with those of their competitors. Context is thus not a predestined background against which agents make choices and take actions; rather, it is constructed and sustained by ongoing practices of all agents. This means that the context, in which all agents are situated, usually cannot be controlled by single or exclusively local agents. At the cluster level, actors can modify their context in several ways, but there are also important components that are out of the hands of local agents. Although firms in clusters may engage in collective action to alter the supply conditions of a specific industry, they cannot easily change the demand of customers directly or influence national macroeconomic policies, legislative frameworks, and education systems. Network. Network refers to the contextualized social relations of agents and firms within, but not limited to, the local production system (Glückler and Panitz 2016). The wider structure of input–output linkages of firms also becomes part of the economic context of agents in clusters. In practice, social and economic relations are inseparable indicating that traded and untraded interdependencies (Storper and Salais 1997) are closely interwoven. As to trade linkages, the incompleteness of contracts leaves room for the development of trust (or distrust) between related partners in the negotiation and during the course of economic transactions. Mutual trusted relations become indispensable for traded interdependencies. Social networks in some places also originate out of economic rationales. Although not

1258

H. Bathelt and P. Li

originally part of economic transactions, new personal relationships may be established over time through repeated economic transactions. In the end, however, it is the compellability and inspiration of personalities that trigger the formation of new social networks at a person-to-person level. Economic transactions offer opportunities for interaction and communication based on which some personal relationships develop, and not others. At the regional level, personal social networks may exist before the formation of clusters. Such networks can become a key mechanism in the diffusion of market and technology information and develop into a cluster later on (Li and Bathelt 2018). Changes in value and belief systems, advances in telecommunication technologies, and the intensification of interfirm competition can trigger a transformation of personal interaction toward a broad societal level beyond the region. In reflecting trust and ontological security systems between different societies, kinship relations can be viewed as providing a stable mode of organizing personal relations in the premodern societal context. These relations have been substituted by relationships of friendship or emotional intimacy in modern society. It is thus reasonable to assume that for clusters in developing or transitional economies, structures of social networks at the personal level will also change in the modernization process. Such a change of basic personal networks impacts strategic actions within clusters, yet to maintain personal relations requires regular interaction and communication. Networks in this sense are “as much process as they are structure, being continually shaped and reshaped by the action of actors who are in turn constrained by the structural positions in which they find themselves” (Nohria 1992, 7). By viewing networks as dynamic connections within heterogeneous contexts that are shaped by actions, the context-network-action framework conceptualizes deeper changes in the socioeconomic structure of clusters. Action. Even though context and network offer powerful insights into the behavior of local agents, action still needs to be treated as a separate pillar in our framework because experience from action develops in a cumulative fashion, and agents learn based on their absorptive capability (Cohen and Levinthal 1990). At both the individual and organizational levels, prior related knowledge helps and directs agents to use and assimilate new knowledge. The more specialized knowledge agents have previously acquired, the faster they can learn within their context or network. The role of absorptive capability of agents suggests that learning is a cumulative and path-dependent process with self-reinforcing characteristics. A conceptualization of cluster evolution without action would bear the risk of overemphasizing exogenous variables. Action refers to the individual level of decisionmaking that depends on specific personal and internal organizational structures, as opposed to the external context. In our framework, context, network, and action are equally indispensable. Conceptualizing cluster dynamics without recognizing all three pillars provides only a partial understanding. Merely emphasizing the role of stable networks in local actions risks failing to understand diversifying patterns with transitional or developmental background. Conceptualizations, which limit themselves to emphasizing the importance of external contexts for actions, conversely neglect the role of human

Relational and Evolutionary Economic Geography Fig. 2 Evolutionary dynamics in the tripolar cluster conception

1259

Context II

Context

Context I Action I Action Action II

Vicious Cycles Virtuous Cycles

Network I Network Network II

agency in regional practices (Scott 2006). Also, the theorization of actions that are withdrawn from the agents’ network and context would lead us to view clusters as organisms or groups of unrelated agents, neither of which would reflect real-world structures. In sum, the tripolar framework offers a systematic way of studying and interpreting the evolution of clusters. At the regional level, it is the interaction of these pillars that explains the evolution of clusters, yet the framework does not produce ideal-type cluster visions since the dynamics of the three pillars can work in both vicious and virtuous ways: I. Vicious Cycles. We refer to interrelationships between the pillars as being vicious if they produce lock-ins and result in regional decline, as illustrated by the contractive interactive movement of the three pillars in Fig. 2. In the literature, economic crises in industrial districts are often explained by changes in economic contexts, such as a sharp drop in demand or the appearance of new technologies. But a transformation of the external environment accounts for only one part in the overall stagnation of clusters. Weaknesses within the networks of a region can also be responsible for the rigidity of old industrial areas. Reasons for regional failure can be classified as different forms of lock-ins (Grabher 1993; Henning et al. 2013), which are consequences of interrelationships between the three pillars. First, decades of cooperation (action) in infrastructure projects and subsidy programs may stabilize intensive relations between people and firms (network) in an industry and corresponding policy field, thus reinforcing a local conservative regime (context) that constrains further adjustment of local agents. The ossified institutional context may result in “political lock-in.” Second,

1260

H. Bathelt and P. Li

long-term personal networks of local agents can result in similar reactions (action) to demand changes and technological opportunities. A homogeneous view of the world caused by intensive social networks in clusters may be the consequence of “cognitive lock-in.” Third, the stable demand for products may fixate the localized social division of labor and support a rigid economic context. The enduring fragmentation of activities among firms can result in shortcomings in the local agents’ learning processes and investment decisions regarding R&D. By exclusively concentrating on certain activities, the local agents’ accumulation of knowledge becomes biased and absorptive capabilities with respect to new knowledge may become more restricted. In Grabher’s (1993) classical typology, this rigidity of interfirm connections (economic context) generates “functional lock-in.” II. Virtuous Cycles. In contrast to the above, virtuous interrelations between the three pillars can develop that have positive effects, as illustrated by the expansive interactive movement of the three pillars in Fig. 2 (Bathelt et al. 2018). Agents with overlapping knowledge bases are, for instance, motivated to cooperate and communicate. Through the action and interaction of diversified agents, knowledge circulates in clusters, ideas collide, and innovation becomes more likely. In turning innovative ideas into business successes, the agents’ relationships (network) that have been established in previous interaction are strengthened. Some commercialization of innovation may fail but there are also successes, which may reorder the existing industrial structure (economic context) and change the existing cluster path. Successful cooperation of agents not only results in economic returns to innovation but may also establish new interpretations of the context within which the agents are situated while producing important knowledge about their strengths and weaknesses. Agents with enhanced reflexive capability with respect to their context are more likely to act in anticipation of, rather than react to, future changes. At the regional level, clusters with pre-active agents across different networks are characterized by high adaptability. This can lead to dynamic processes of path creation. Along with dynamic interactions of agents, mutual expectations regarding the coordination of actions may turn into unconscious routines and norms (institutional context), which become new components of the overall intangible regional assets. In the long run, these regional assets both social networks and routines of doing business are thus constructed, sustained, and altered through social reproduction. In Giddens’ (1984) sense, the interaction of context, network, and action in virtuous cycles actively drives a regionalized structuration process.

6

Conclusions

This chapter started by discussing recent relational and evolutionary perspectives in economic geography, arguing that it is useful to integrate both approaches to combine their strengths rather than discussing them against one another. Applying

Relational and Evolutionary Economic Geography

1261

these approaches to the study of regional industry clusters, it is suggested that both have shortcomings if used in isolation: While relational conceptualizations that focus on the social relations and structural dimensions of clusters tend to neglect aspects of cluster dynamics, evolutionary approaches do not sufficiently understand the underlying structure of social relations in clusters. To overcome this, we suggest a tripolar framework of cluster evolution that presents a combined relational-evolutionary perspective. Some elements of this framework are also reflected in the adaptivecluster model described by Martin and Sunley (2011) – albeit at the expense of assuming a predefined natural cycle. We believe that the tripolar approach provides important insights about network dynamics and cluster evolution in a spatial perspective: First, the concept of network is relational in nature and should be interpreted in a contingent way (Bathelt and Glückler 2011, 2017). Research on networks in clusters has focused on the intensity of existing linkages, generally assuming that such ties are responsible for regional success or failure. Be the ties strong or weak, a relational-evolutionary perspective is skeptical of whether such a static interpretation of network can account for multifaceted regional developments. In the tripolar framework, network is only one pillar of the entire system and changes over time in interaction with the other pillars of context and action. One has to consider the dynamics of the whole framework to be able to properly evaluate the impact of strong or weak ties on cluster evolution. A specific network structure, for example, that supports a cluster’s growth in one instance may turn out to be detrimental to regional competitiveness in a different setting. Second, local traditions of action and interaction need to be evaluated in the specific context in which they matter for cluster evolution. Contextualized interpretations provide a perspective to understand why history matters in a nondeterministic way which is a thorny issue in evolutionary economic geography. With new political-economic contexts, for instance, new practices of interaction among individuals and organizations can form and become new elements of local structures and traditions. But not all practices of interaction develop into key elements of “regional assets.” The degree to which a regional path can be established by local action depends on the specific context within which the local agents are situated (Storper 2009). Third, the evolution of clusters is shaped by the aggregated action of local agents, as well as the unintended consequences of this action. Since some contexts are out of the hands of local agents, action may have unintended effects that shape future settings and affect individual and collective action in the next round. Consequently, the integrated relational-evolutionary framework rejects a normative model of cluster evolution or cluster life cycles, especially since there are unexpected strategic actions that may, in the end, significantly alter the trajectory of clusters. In sum, the tripolar framework conceptualizes cluster evolution through systematic interrelationships and ongoing feedbacks between context, network, and action. Focusing on the interdependencies of these important pillars, the framework demonstrates the value added of combining relational network-focused and evolutionary approaches in cluster research. A relational component in the tripolar framework

1262

H. Bathelt and P. Li

helps explain why clusters evolve, thus avoiding deterministic elements as in previous cyclical and evolutionary approaches. Further, an evolutionary perspective serves to extend the interpretation of local relations from a traditional static to a dynamic level of analysis. As an illustration of relational-evolutionary theorization on a specific topic, the tripolar framework reveals the potential of combining these different approaches to deepen our understandings of turbulent regional worlds. Therefore, this chapter may be regarded as an invitation to an integrated relationalevolutionary theorizing in economic geography.

7

Cross-References

▶ Clusters, Local Districts, and Innovative Milieux ▶ Path Dependence and the Spatial Economy: Putting History in Its Place ▶ Systems of Innovation and the Learning Region Acknowledgements This chapter, to which both authors have contributed equally, is based on a more extensive study (Li et al. 2012) and an earlier working paper (Li and Bathelt 2011). We would like to thank Andres Rodríguez-Pose for his encouragement and Manfred Fischer and Peter Maskell for thoughtful comments.

References Bathelt H, Glückler J (2011) The relational economy: geographies of knowing and learning. Oxford University Press, Oxford Bathelt H, Glückler J (2017) Relational research design in economic geography. In: Clark GL, Feldman MP, Gertler MS, Wójcik D (eds) The new Oxford handbook of economic geography. Oxford University Press, Oxford, pp 179–195 Bathelt H, Malmberg A, Maskell P (2004) Clusters and knowledge: local buzz, global pipelines and the process of knowledge creation. Prog Hum Geogr 28(1):31–56 Bathelt H, Cantwell J, Mudambi R (2018) Overcoming frictions in trans-national knowledge flows: challenges of connecting, sense-making and integrating. J Econ Geogr 18(5):1001–1022 Boggs JS, Rantisi NM (2003) The ‘relational’ turn in economic geography. J Econ Geogr 3 (2):109–116 Boschma R, Iammarino S (2009) Related variety, trade linkages and regional growth in Italy. Econ Geogr 85(3):289–311 Bresnahan T, Gambardella A, Saxenian A (2001) ‘Old economy’ inputs for ‘new economy’ outcomes: cluster formation in the new Silicon Valleys. Ind Corp Change 10(4):835–860 Cohen M, Levinthal DA (1990) Absorptive capacity: a new perspective on learning and innovation. Adm Sci Q 35(1):128–152 Faulconbridge JR (2017) Relational geographies of knowledge and innovation. In: Bathelt H, Cohendet P, Henn S, Simon L (eds), The Elgar companion to innovation and knowledge creation. Edward Elgar Publishing: Cheltenham/Northampton, pp. 671–684 Fitjar RD, Rodríguez-Pose A (2011) Innovating in the periphery: firms, values and innovation in Southwest Norway. Eur Plan Stud 19(4):555–574 Frenken K, van Oort FG, Verburg T (2007) Related variety, unrelated variety and regional economic growth. Reg Stud 41(5):685–697

Relational and Evolutionary Economic Geography

1263

Gertler M (2004) Manufacturing culture: the institutional geography of industrial practice. Oxford University Press, Oxford Giddens A (1984) The constitution of society: outline of the theory of structuration. Polity, Cambridge Glückler J, Panitz R (2016) Relational upgrading in global value networks. J Econ Geogr 16 (6):1161–1185 Grabher G (1993) The weakness of strong ties: the ‘lock-in’ of regional development in the Ruhr area. In: Grabher G (ed) The embedded firm: on the socio-economics of industrial networks. Routledge, London, pp 255–278 Granovetter M (1985) Economic action and social structure: the problem of embeddedness. Am J Sociol 91(3):481–510 Hadjimichalis C (2006) The end of third Italy as we knew it? Antipode 38(1):82–106 Hassink R, Klaerding C (2009) Relational and evolutionary economic geography: competing or complementary paradigms? Papers in evolutionary economic geography # 09.11, Urban & Regional Research Centre, Utrecht University, Utrecht Henning M (2019) Time should tell (more): evolutionary economic geography and the challenge of history. Reg Stud 53(4):602–613 Henning M, Stam E, Wenting R (2013) Path dependence research in regional economic development: cacophony or knowledge accumulation? Reg Stud 47(8):1348–1362 Iammarino S, McCann P (2006) The structure and evolution of industrial clusters: transactions, technology and knowledge spillovers. Res Policy 35(7):1018–1036 Klepper S (1997) Industry life cycles. Ind Corp Change 6(1):145–181 Kogler D (ed) (2015) Evolutionary economic geography: theoretical and empirical progress. Routledge, Abingdon/New York Krugman P (1991) Geography and trade. MIT Press, Cambridge Li P-F, Bathelt H (2011) A relational-evolutionary perspective of cluster dynamics. SPACES online, Vol. 9, Issue 2011–02. Toronto/Heidelberg: www.spaces-online.com Li P, Bathelt H (2018) Location strategy in cluster networks. J Int Bus Stud 49(8):967–989 Li P-F, Bathelt H, Wang J (2012) Network dynamics and cluster evolution: changing trajectories of the aluminium extrusion industry in Dali. J Econ Geogr 12(1):127–155 MacKinnon D, Cumbers A, Pike A, Birch K, McMaster R (2010) Evolution in economic geography: institutions, political economy and adaptation. Econ Geogr 85(2):129–150 Malmberg A, Maskell P (2002) The elusive concept of localization economies: towards a knowledge-based theory of spatial clustering. Environ Plann A 34(3):429–449 Martin R (2010) Rethinking regional path dependence: beyond lock-in to evolution. Econ Geogr 86 (1):1–27 Martin R, Sunley P (2006) Path dependence and regional economic evolution. J Econ Geogr 6 (4):395–437 Martin R, Sunley P (2011) Conceptualizing cluster evolution: beyond the life cycle model? Reg Stud 45(10):1299–1318 Maskell P, Malmberg A (2007) Myopia, knowledge development and cluster evolution. J Econ Geogr 7(5):603–618 Menzel M-P, Fornahl D (2009) Cluster life cycles – dimensions and rationales of cluster evolution. Ind Corp Change 19(1):205–238 Nohria N (1992) Introduction: is a network perspective a useful way of studying organizations. In: Nohria N, Eccles RG (eds) Networks and organizations: structure, form, and action. Harvard University Press, Cambridge, pp 1–22 Owen-Smith J, Powell WW (2004) Knowledge networks as channels and conduits: the effects of spillovers in the Boston biotechnology community. Organ Sci 15(1):2–21 Saxenian A (2006) The new argonauts: regional advantage in a global economy. Harvard University Press, Cambridge

1264

H. Bathelt and P. Li

Scott AJ (2006) Origins and growth of the Hollywood motion-picture industry: the first three decades. In: Braunerhjelm P, Feldman M (eds) Cluster genesis: technology-based industrial development. Oxford University Press, Oxford, pp 17–38 Storper M (2009) Regional context and global trade. Econ Geogr 85(1):1–21 Storper M, Salais R (1997) Worlds of production: the action framework of the economy. Harvard University Press, Cambridge Storper M, Venables AJ (2004) Buzz: face-to-face contact and the urban economy. J Econ Geogr 4 (4):351–370 Sydow J, Lerch F, Staber U (2010) Planning for path dependence? The case of a network in the Berlin-Brandenburg optics cluster. Econ Geogr 86(2):173–195

Path Dependence and the Spatial Economy: Putting History in Its Place Ron Martin

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Path Dependence as Self-Reinforcing Spatial Economic “Lock-In”: What Does It Mean and How Common Is It? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Rethinking Path Dependence: From “Lock-In” to Ongoing Path Evolution . . . . . . . . . . . . . 4 Toward a “Developmental–Evolutionary” Model of Path Dependence in the Spatial Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1266 1268 1275 1278 1284 1286 1286

Abstract

The concept of path dependence has rapidly assumed the status of a “fundamental principle” in the new paradigm of evolutionary economic geography that has emerged over the past few years. This chapter reviews the interpretation and use of this concept within this new field. The dominant interpretation has been that of “lockin,” by self-reinforcing mechanisms, of particular (equilibrium) patterns of industrial location and regional specialization. This model is somewhat restrictive, however, and does not capture the full repertoire of ongoing path-dependent evolutionary trajectories that can be observed in the economic landscape. To respond to this limitation and drawing on both revisionary approaches to path dependence in political science and historical sociology and advances in the field of evolutionary economic geography, the chapter suggests a “developmental–evolutionary” model of path dependence that includes “lock-in” as a special case, but which is also more general in its application and relevance.

R. Martin (*) Department of Geography and St. Catharine’s College, University of Cambridge, Cambridge, UK e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_34

1265

1266

R. Martin

Keywords

Economic landscapes · Regional development · History · Path dependence · Lock-in · Evolution

1

Introduction

Over the past three decades, the concept of path dependence has assumed a key explanatory role in a wide spectrum of social sciences and is now part of the standard lexicon of any approach that has pretensions to being “evolutionary” in orientation. Any evolutionary perspective on the socioeconomy starts from an elementary but important fact, namely, that, in each period, a socioeconomy inherits the legacy of its own past. Once this is acknowledged, we are faced with the possibility that “history matters.” The notion of path dependence is intended to imbue this idea with some degree of conceptual and explanatory rigor and hence go beyond simple narratives that merely describe historical effects. The idea of path dependence in its modern form was first developed by economists Paul David and Brian Arthur in the 1980s and early 1990s (David 1985, 1986; Arthur 1989, 1994) to explain technology adoption processes and industry evolution. Arthur’s discussion of the concept is particularly interesting, since he made explicit reference to the importance of path dependence in shaping the location of industry (Arthur 1994), a point also emphasized around the same time by Paul Krugman, who argued that: If there is one single area of economics in which path dependence is unmistakable, it is in economic geography – the location of production in space. The long shadow cast by history over location is apparent at all scales, from the smallest to the largest. (Paul Krugman 1991, p. 80)

Although an early use of path dependence ideas to explain regional development was Grabher’s (1993) study of the Ruhr in Germany, it has only been over the past 15 years or so that the notion has really been taken up by economic geographers. In particular, the concept has assumed central importance in the theoretical and empirical contributions to the new paradigm of evolutionary economic geography that has recently emerged (see Boschma and Martin 2007, 2010; Martin 2010a) and has come to be regarded as a key “organizing concept” for understanding how the economic landscape evolves over time. The reason for this take-up of the concept of path dependence in evolutionary economic geography is not hard to explain. As evinced by David and Arthur, path dependence refers to a particular type of process that leads to the asymptotic convergence of an economic form or structure to a stable, “locked-in” configuration that can only be changed or “de-locked” by some sort of external shock or disturbance (see also Castaldi and Dosi 2006). Likewise, geographers argue, industrial location patterns and regional economic specializations show a similar process of self-reinforcing “lock-in.” We know that regional industrial structures, local economic specialisms, urban locations, and geographical patterns of development do

Path Dependence and the Spatial Economy: Putting History in Its Place

1267

not suddenly spring up overnight, but have their origins in the past and are built up over time, in many cases spanning several decades. Neither, typically, do spatial economic structures and configurations change suddenly. It is clear that any one point in time, the spatial structure of an economy is very similar to, and highly influenced by, the structure in the immediate and even less immediate past. The economic landscape we observe at any point in time has been shaped by the historical adjustment path taken to it: it reflects its past development. Put another way, the economic landscape evolves as a consequence of its own history. In this sense, as Krugman argues (in the quote above), it is possible to argue that history casts a “long shadow” over industrial location patterns. Yet, intuitive and appealing though this invocation of path dependence as a process shaping the spatial economy might be, it is not a straightforward notion. Even a “lock-in” definition can be given different interpretations and representations, and “lock-in,” if present, can be considered to be a positive feature or a negative one (see Martin and Sunley 2006). Further, different authors have used different formal models to represent path dependence, and these imply somewhat different definitions of the process. More generally, in the last few years, the customary definition of path dependence as “lock-in” has itself come under increasing scrutiny, especially in political science, historical sociology, and management and organization studies. In these fields, there has been a growing reaction against the original “canonical” model of path dependence as articulated by David and Arthur. Critics of this conceptualization argue that “lock-in” implies stability, stasis, or no change or at the very least a particular form of evolution, namely, one in which long periods of stability are separated by periodic phases of rapid and disruptive change, whereas in reality many social and political structures and product and process technologies within business organizations evolve more or less continuously, yet still display path dependence. Accordingly, these writers have put forward alternative or revised models of path dependence that they believe more faithfully capture the actual varieties and patterns of path-dependent development observed in socioeconomic and technological systems. These alternative models may not produce the very “strong” form of path dependence associated with “lock-in,” but rather depict path dependence as an ongoing or “unfolding” process of adaptation or mutation whereby purposive behavior by individual agents, drawing on the structures and outcomes inherited from the past, actively reshapes those structures and outcomes: path dependence and path adaptation become inextricably linked. Economic geographers and regional analysts do not seem to have been fully appreciative of these debates, yet they have important implications for how the concept of path dependence can be used to understand how regional and local economies evolve (see Martin and Sunley 2006; Martin 2010a, 2012a). And here, additional issues also intervene. Regional and local economies are complex, heterogeneous, and highly open systems, often encompassing several different industries and activities or subsystems, and are unlike the singular technologies, institutions, or products that are so often the subject of path dependence analysis in other disciplines. This complexity begs the question of what it is in regional and local economies that is path dependent. In addition, and potentially of critical importance,

1268

R. Martin

is the question whether path dependence is simply a general process or dynamic that shapes geographical economic outcomes, or is itself a process that is shaped by geographical context: in other words, is path dependence to some extent place dependent (Martin and Sunley 2006)? Still further, how does the concept of path dependence relate to our existing theories of (uneven) regional development? One of the criticisms leveled at the expanding paradigm of evolutionary economic geography is that its advocates seem more intent on constructing a separate perspective than on integrating their evolutionary ideas and concepts with existing approaches, some of which, it is claimed, already take history seriously in one way or another (Mackinnon et al. 2009; Coe 2011; Oosterlynck 2012). This issue would seem to apply a fortiori to path dependence. Should the aim be to construct a distinct “path dependence theory” of regional growth and development, and what would such a theory look like? Or should the objective be to explore both the implications of the concept for existing regional theories, and what those theories imply for the idea of path dependence? It is certainly not possible to take up all of these issues in detail in this short review of the “state of the art,” and what follows is necessarily somewhat selective and partial in coverage. I begin by summarizing the “canonical” interpretation of path dependence as “self-reinforcing lock-in” and how far and in what ways this model is applicable to the spatial economic landscape. I then move on to examine some of the alternative views of path dependence that have been emerging in certain social and historical sciences and what these interpretations imply for how we might think about the idea of regional path dependence. Building on this discussion, I then suggest what might be called a “developmental–evolutionary” view of path dependence, an interpretation that, while capable of including the basic “lock-in” model as a special case, allows for a much wider repertoire of evolutionary outcomes.

2

Path Dependence as Self-Reinforcing Spatial Economic “Lock-In”: What Does It Mean and How Common Is It?

At least five formal models (and associated interpretations) of path dependence can be identified from the economic literature (see Table 1). Both David and Arthur conceptualize path dependence in terms of the limiting distributions of non-ergodic stochastic processes. In David’s accounts, path dependence is likened to a Markov chain process with one or more absorbing states. Which absorbing state (long-run limiting equilibrium distribution) such a system will end up in will depend on where it started – “history matters” – but once in that state, the system cannot escape: it becomes “locked” into that particular equilibrium outcome. In David’s words: Small events of a random character—especially those occurring early on the path—are likely to figure significantly in ‘selecting’ one or other among the set of stable equilibria, or ‘attractors.’ (David 2005, p. 151)

Path Dependence and the Spatial Economy: Putting History in Its Place

1269

The elaboration of theories around the core concept of path dependent dynamics . . . encourages and enables economists to entertain the possibility that, in place of a unique equilibriumseeking dynamic, they should envisage a process that is seeking an historically-contingent equilibrium. (David 2007, p. 2)

In Arthur’s formalization, path dependence is represented by a nonlinear Polya urn process, which possesses a multiplicity of possible stable fixed (equilibrium) outcomes (structures) of which one will be dynamically “selected” and “locked in”:

Table 1 Five formal models of path dependence Formal model representation 1. Path dependence as an absorbing Markov chain process, in which a system has a probability transition matrix with more than one absorbing states (pii = 1), so that the system converges on a final distribution across states that depends in the initial (starting) distribution 2. Path dependence as a nonlinear Polya (urn) stochastic process in which the probability of an outcome of a given type in a given period increases the probability of generating that same outcome in the next period (a “proportions to probability” mapping). Model converges to an equilibrium distribution that is dependent on the initial (random) distribution 3. Path dependence as a unit root process in which a dynamic system can be reduced to a difference equation in the key dependent variable, say X, which possesses a unit root, which means that the value of X in any period t embodies “memory” of its previous values and is thus dependent on its prior adjustment path 4. Path dependence as a recursive cumulative causation process in which recursive feedbacks among the structural components or relationships of a system reinforce a given trajectory or pattern of development. No longterm equilibrium is implied 5. Path dependence as the existence of multiple steady-state equilibria, the unique dynamic path to any specific one of which will depend on the initial geographical distribution of economic activity

Example application Model implied in David’s writings on the “lock-in” of technologies or institutional standards to historically fixed and unchanging forms, which may or may not be the most (market) efficient

Model used by Arthur to generate the progressive “lock-in” of locational distributions of industries or cities to long-run stable spatial patterns that become selfreproducing

Model used by various authors to study shortand long-run macro-dynamic phenomena. Such unit root models do capture at least one key aspect of path dependence, namely, the propensity for transitory random event shocks to have permanent effects Model used to generate Kaldorian-type, export-driven models of national and regional growth, with recursive dynamics (see Setterfield 2009), for example, X t ! Y t ! Zt + 1 ! X t + 1 and so on Formal spatial economic models in which the strength of agglomeration externalities and of trade and migration frictions, under different “initial conditions,” is used to determine longrun steady-state outcomes (“path dependence”), (see, e.g., Allen and Donaldson 2018)

1270

R. Martin

Often there is a multiplicity of patterns that are candidates for long-term self-reinforcement: the cumulation of small events early on ‘pushes’ the dynamics into the orbit of one of these and thus ‘selects’ the structure that the system eventually locks into. (Arthur 1994, p. 33)

A third way in which path dependence has been modeled is in terms of a dynamic system which can be reduced to a difference equation in some key dependent variable, say X, which possesses a unit root, which means that the value of Xt in any period t embodies “memory” of its previous values and is thus dependent on its entire prior adjustment path. Such a unit root system does not converge to any equilibrium value or state: instead, the “long-run outcome” is defined as such by virtue of the temporal distance from some initial starting state, that is, from X0. Such an interpretation begs the question, of course, of what the “carrier of history” mechanism is that imbues the system or characteristic X with path dependence. Fourthly, some authors (e.g., Setterfield 1998, 2009) provide this mechanism in terms of Kaldorian-type recursive cumulative causation models of national and regional growth, in which technological and institutional dynamics also play a key role. Again, in these structural difference equation models, no long-run equilibrium solution or state is implied. Fifthly, a not dissimilar similar idea of self-reinforcing “lock-in” of a spatial economic structure into one of a number of possible multiple equilibrium patterns is to be found in new economic geography (NEG) and related spatial economic models. However, since many such early models were basically comparative static in nature, in which spatial agglomeration as one possible equilibrium outcome occurs instantaneously (for given assumptions about transport costs, wage functions, etc.), the often-made claim that those models incorporate “history” and “path dependence” (Krugman 1991) is questionable (see Martin 2010b; Garretsen and Martin 2010). More recent models, however, are more dynamic in nature. For example, in their formal NEG-inspired model of the geography of path dependence, Allen and Donaldson (2018) show that “lock-in” to one of several possible multiple steady-state equilibrium spatial distributions of economic activity depends crucially on six key “elasticities”: the elasticities of interregional trade and migration responses to payoffs; two contemporaneous spillovers, for production and amenities; and two historical spillover parameters, again one each for production and amenities. Thus far, models of this sort have not figured in economic geographers’ studies of spatial or regional path dependence, even though they offer some potentially useful general insights into the role of path dependence in shaping economic landscapes. Instead, economic geographers have tended to rely almost entirely on the original notions of path dependence developed by David and Arthur. Arthur (1994) used his model to show how industrial location can be interpreted as a path-dependent process that progressively “locks into” a stable, fixed distribution (of shares) of firms across geographical regions. It is assumed that the autocatalytic or self-reinforcing mechanisms that generate path dependence (see Table 2 for the mechanisms identified by David and Arthur) have a spatial dimension, in that firms choosing where to locate are attracted by the presence of other firms in a region. Arthur describes two such models. In the “spin-off” version, the path-

Path Dependence and the Spatial Economy: Putting History in Its Place

1271

Table 2 Processes generating path-dependent “lock-in”: David’s and Arthur’s models compared David’s model (“network externalities”) 1. Technical interrelatedness (the reinforcing effects of complementarity and compatibility among the different components of a technology and its use) 2. Economies of scale (the benefits associated with the increasing use of a technology – such as a decline in user costs – as the technology gains in acceptance relative to other systems) 3. The quasi-irreversibility of investments (the difficulties of switching technology-specific capital and human skills to alternative uses)

Arthur’s model (“increasing returns effects”) 1. Large initial fixed setup costs (in effect the inertia of sunk costs)

2. Dynamic learning effects (learning by doing or using and learning by interaction tend to entail positive feedbacks) 3. Coordination effects (which confer advantages to going along with other economic agents taking similar actions) 4. Self-reinforcing expectations (when the increased prevalence of a product, technology, process, or practice enhances beliefs of further prevalence)

Based on Martin (2010a)

dependent geographical distribution of industry occurs through a process of local firm “spin-offs” from parent firms: this type of birth mechanism is argued to have characterized the US electronics and car industries. In the “agglomeration economies” version, if one region by chance gets off to a good start, its attractiveness and the probability that it will be chosen will be enhanced, further firms may then choose this region, and it becomes yet more attractive because of the emergence of various agglomeration economies and externalities, and the concentration of firms there becomes self-reinforcing. If such agglomeration economies are unbounded, then the model predicts that all of the firms in the industry will eventually end up in one region: Arthur suggests Silicon Valley as a possible example of this sort of “locational monopoly.” The point about both David’s and Arthur’s models is that according to chance (combined in some cases with necessity, such as the spatial distribution of raw materials or natural resources), a different path-dependent spatial outcome might have been obtained with some other region becoming dominant. Thus, with different initial conditions (chance, random, or contingent events) combined with selfreinforcing path dependence effects, a multiplicity of possible equilibrium spatial economic structures can result. Which particular equilibrium (long-run) spatial economic structure becomes “locked in” is assumed to remain fixed until such time that it is “de-locked” by a disturbance of one kind or another. (Likewise, in NEG models, a shock – such as a reduction in transport costs or a policy intervention – can “de-lock” one equilibrium spatial distribution of economic activity and move the system to a different equilibrium pattern, what NEG theorists refer to as “locational hysteresis.”) Construed in these terms, then, the economic landscape evolves by the emergence and “lock-in” of historically contingent, long-run equilibrium locational patterns of industrial activity and specialization that are periodically disrupted and, eventually, replaced

1272

R. Martin

Table 3 Some possible sources of regional path dependence Source 1. Natural resource based 2. Sunk costs of local productive, physical, and infrastructural assets 3. Local external economies of industrial specialization 4. Local technological lock-in

5. Localized “spin-off” firm birth process

6. Agglomeration economies

7. Interregional linkages and interdependencies

Features Regional development path shaped and constrained by dependence on a particular raw material or resource Durability and quasi-irreversibility of local specialized capital equipment, infrastructures, or built forms Marshallian-type dynamic externalities and both traded and untraded interdependencies associated with specialized local industrial districts or clusters Development of distinctive local technological regime or innovation system through local collective learning, cognitive inertia, knowledge transfers, and imitative behavior Local parent firms are sources of “spin-off” firms in similar or related activity, possibly supplying the original parent firms, leading to the buildup of a local industrial specialization or cluster Generalized self-reinforcing economies associated with spatial concentration of activity, including product and labor market effects, thick networks of suppliers, services, and information Development path in one region shaped by and dependent on development paths in another region, for example, because of interindustry linkages (such as acting as a specialist supplier of inputs to, or dependent on inputs supplied by, an industry in another region)

Based on Martin and Sunley (2006)

by the path-dependent development of new historically contingent equilibrium patterns: in essence, an evolution characterized primarily by “punctuated equilibria” (David (2007, p. 187) explicitly aligns his path dependence model with this view of economic evolution). How far does this version of path dependence capture real-world patterns of regional economic development? (Table 3). To some extent, this depends on what it is we are looking at. Many industrial location patterns and local industrial specializations, once established, often do seem to be subject to, or to give rise to, “selfreinforcing” mechanisms and processes that “lock” those patterns. However, this conception also raises several questions (Martin 2010a). For one thing, the model predicts a progressive “lock-in” to a long-run equilibrium stable state – in the sense of a stable pattern of localization of an industry or pattern of local sectoral specialization. Indeed, David (2005, 2007) actually refers to “path-dependent equilibrium economics.” But just how long is the “long run”? The problem with using a formal stochastic model (such as an absorbing Markov process or a nonlinear Polya urn process) to define path dependence is that there is no correspondence between the (logical) “convergence to equilibrium time” in such models and the real history of actual real-world processes of economic activity and development. Nor do realworld economies necessarily ever reach equilibrium states, even of a “historically

Path Dependence and the Spatial Economy: Putting History in Its Place

1273

contingent” kind. The idea of an equilibrium, or of multiple equilibria, is an imposed assumption, not a proven fact. As Setterfield (2009) argues, the very process of an actually existing economy approaching a stable long-run equilibrium position or state is itself likely to stimulate purposive individual behavior – “innovation,” as he calls it – by economic actors to change their activities and thus prevent that economy from becoming fully locked into that particular equilibrium state. This is the sentiment expressed in another way by Metcalfe et al. (2006), who argue that because knowledge is constantly changing, capitalism can never be in a state of long-run equilibrium, but is instead in a state of constant “restlessness” in which some sort of innovation and structural change occurs more or less continuously. To be sure, such innovation and structural change may at times be slow and incremental, or at other times may occur in fits and starts, but the idea of an economy being in any type of stable or fixed equilibrium, essentially a state of stasis, is incompatible with the nature of capitalism as a dynamic, evolving system, as a process of continual “creative destruction.” In what sense, then, is it possible to talk of spatial or regional “lock-in”? As mentioned above, from one vantage point, the locational patterns of industry and specialization can be viewed as being characterized by a certain degree of “spatial equilibrium” or “lock-in” (what economic geographers used to call “inertia”). However, the longer, in historical time, the “long run” is specified or permitted to be, the less the spatial structure of the economy is likely to remain in a stable, unchanging state. Moreover, even if the spatial patterns of industries across regions and locations are stable (“locked in”), this need not suggest that the particular industries and specialisms within individual regions and locations are necessarily in a state of stasis or stability, with no product, technological, or organizational change. However, this is precisely what Krugman (1991) ignores in his discussion of the path dependence and “persistent dominance” of the US Manufacturing Belt. While US manufacturing has long been concentrated in a relatively small area of the North East and East North Central regions, the nature of the manufacturing activity conducted there has evolved considerably over time. The geographies of production may shift and change relatively slowly, but the firms and industries in a region may be characterized by significant ongoing endogenous change. Competition from other producers in other regions, or from other producers within the same region, is a constant source of pressure on a given region’s firms to upgrade and modernize their products, to introduce new variants or ranges of products, and to improve their productive efficiency through innovation. If a region’s firms fail to upgrade and innovate sufficiently, they face losing competitiveness and market share and even going out of business. This process need not occur suddenly: economic landscapes are littered with industrial districts and clusters that have undergone slow and protracted decline. Some such districts and clusters may eventually disappear altogether. Others, however, may survive and even undergo renewed success, sometimes much smaller in size and serving specialist niche markets and in other instances undergoing renewed expansion based on shifting into related or complementary specialisms (for a discussion of the different evolutionary trajectories that clusters may undergo, see Martin and Sunley 2011). A “punctuated equilibrium” model of

1274

R. Martin

path-dependent regional “lock-in” may not in fact fit many actual regional and local experiences. Yet further, unlike a singular technology or institutional standard (of the sort favored by David and many others in their discussions of path dependence), a given industry in a region (even if it is the only industry) is typically composed of numerous firms, among which there is bound to be heterogeneity, of products, production methods, innovativeness, business strategy, and so on. This heterogeneity or variety – or “composition” effect – also suggests that the idea of a regional industry becoming “locked” into a stable equilibrium state, a state of stasis, is unlikely to be the norm. Very specific local circumstances are required for this sort of outcome to occur, such as a local industry based entirely on a local natural resource or when the firms in a local industry are closely linked by a very high degree of technological interrelatedness – for example, a form of production involving a detailed horizontal interfirm division of labor – such that a change in one firm would require a change among all or almost all other firms, which might prevent any one single firm from changing in isolation. Such examples do obviously occur. But in most cases, the “lock-in” of locational patterns of industrial specialization across space by no means implies or leads to the technological or product “lock-in” of the firms in individual places. The technological and product bases of firms, and thus industries, can and do change and develop over time, and the trajectories along which such development occurs can and do display path dependence, in that the improvements, innovations, and adaptations that firms make to their products and technologies invariably build upon and are shaped by their existing products and technologies, which in their turn evolved out of previous versions. Essentially, then, as Page (2006) points out, it is possible to distinguish between two main types of path dependence: path dependence in which long-run equilibria depend on history and path dependence in which outcomes are history dependent. Equilibrium path dependence is where the long-run distribution over outcomes depends on past outcomes: it is all about the historically contingent selection and self-reinforcing convergence to one of a number of possible limiting distributions over outcomes and links directly to the idea of progressive “lock-in.” Outcome path dependence is where the outcome in a period depends on past outcomes. Equilibrium path dependence implies outcome dependence, since if the long-run equilibrium distribution over outcomes depends on the past, then so must the outcomes in individual time periods. But outcome dependence does not imply equilibrium dependence: history matters in that current outcomes can be related, to some degree or other, to previous outcomes, and thus different previous outcomes are likely to have led to different present outcomes – the process is path dependent – but the path of outcomes over time need not converge to any long-run equilibrium or stable outcome. What this opens up is the possibility of a wider interpretation or conceptualization of path dependence, in which “lock-in” is only one, and a particularly “strong,” possibility and in which pathways of technological, industrial, and regional development themselves evolve and unfold over time in an outcome-dependent manner. This suggests the need for a wider conception of path dependence.

Path Dependence and the Spatial Economy: Putting History in Its Place

3

1275

Rethinking Path Dependence: From “Lock-In” to Ongoing Path Evolution

Over the past few years, an increasing number of political scientists, historical sociologists, and management scientists have begun to explore what form(s) such a wider conception of path dependence might take. The “lock-in” model of path dependence has been frequently adopted in these disciplines to describe how political systems, social institutions, management practices, and the like evolve over time. But a growing corpus of empirical work has indicated that the evolution of these systems, structures, and organizations may in fact be much more ongoing and incremental than the “canonical” path dependence model would suggest. Put another way, it is argued that the standard path dependence model overemphasizes stability at the expense of ongoing change, mutation, adaptation, branching, and reorientation. In historical sociology and political science, a number of amendments and revisions to the standard “lock-in by self-reinforcing effects” model of path dependence have been proposed. One such is the so-called “reactive sequence” model, to cover forms of path dependence that are not straightforwardly self-reproducing (see, e.g., Mahoney 2000, 2006 and Mahoney and Schensul 2008). Reactive sequences are chains of temporally ordered and causally connected events, in which each event in a sequence is both a reaction to antecedent events and a cause of subsequent events: history again matters. Such sequences differ from increasing returns-type sequences in that while the latter are characterized by processes of reproduction that reinforce early events, reactive sequences are marked by processes that transform, or even reverse, early events and outcomes. In a reactive sequence view of path dependence, initial contingent events, which are typically considered as key historical “breakpoints,” trigger subsequent development not by reproducing a given pattern but by setting in motion a chain of tightly linked reactions. The contingent event may itself be the intersection point of two or more prior sequences. The key point is that the focus is on how development paths actually evolve, as a sequence of connected occurrences and reactions, of opportunities and purposive behaviors, and of actual decisions and their consequences (Mahoney 2000, 2006; Mahoney and Schensul 2008). To that end, while a conventional path dependence model of the emergence of a particular mode or form of regional development, say the Silicon Valley example referred by Arthur in his writings, would focus on the selfreinforcing increasing returns effects associated with the spatial agglomeration of certain types of firms, workers, and knowledge, a reactive sequence perspective would focus on the key actors and actions involved and on how sequential decisions have shaped the region’s (say, Silicon Valley’s) evolutionary developmental trajectory. In other revisionary perspectives, three main mechanisms have been suggested that operate at the micro-level to impart ongoing change to the path-dependent evolution of institutions and economic organizations: “layering,” “conversion,” and “recombination” (see, e.g., Boas 2007; Schneiberg 2007; see also Martin 2010a). In a layering process, an institution or other such system changes gradually

1276

R. Martin

by the addition and accretion of new rules, procedures, or structures to what already exists. Each new “layer” (rule, etc.) constitutes only a small change of the institution as a whole, but this process can be cumulative over time so that while path dependent, the institution also evolves, leading to the mutation or even transformation of the institution’s fundamental nature. Not only does the addition of a new rule or procedure to an institution depend on there being existing network externalities for its success, but this addition changes those externalities incrementally – and sometimes more substantially – in the process. Continuous incremental institutional change is thus both path dependent and path evolving. A second process by which ongoing path-dependent change may occur in political and sociological systems is “conversion,” that is, the reorientation of an institution or other such system in terms of form, function, or both. Conversion can occur in two ways. First, the addition of new “layers” (new rules, procedures, and so forth) is itself a source of institutional conversion or reorientation, since the addition of new rules or procedures typically arises from the need or desire to alter an institution to serve new functions, roles, or imperatives. And the addition of a new layer may arise from, lead to, or necessitate the removal of an old layer. The second source of conversion is when the existing structures and arrangements of an institution are reoriented to serve new purposes, in response to external pressures or developments, or as part of a learning process by which existing rules are improved. No new rules or procedures as such need be added; rather, existing rules and procedures are realigned or modified. In some cases, however, the conversion of an institution may be possible only by means of a layering process. Although the recent political science literature has proposed that layering and conversion processes are separate and distinct mechanisms of incremental path-dependent institutional change, in reality they frequently coexist and interact. Moreover, while these mechanisms can be argued as alternatives to explanations couched in terms of path dependence mechanisms, they are in fact consistent with such mechanisms – indeed, they depend on them for their adoption and success. But unlike the canonical model of path dependence, layering and conversion processes need not lead to “lock-in”: rather, they may well prevent “lock-in” from occurring. Thirdly, other writers have proposed what they call a “recombinant” path dependence model. The basic idea is that any particular existing social–political–economic structure is, in effect, a system of resources and properties that actors can recombine and redefine, in conjunction with new resources and properties, to produce a new structure. Such a recombination is a source of path dependence in that what resources exist shape to some degree what changes can be made. The degree and nature of the “structured variety” that characterizes a socioeconomic system may thus be of some importance, since it will condition the range of resources that can be recombined (Boas 2007; Schneiberg 2007). This recombination of existing social and institutional resources can be incremental but even plays a role at times of radical change. In the management sciences, too, interest has focused on deriving alternative conceptions of path dependence that escape the restrictions of the canonical “lock-

Path Dependence and the Spatial Economy: Putting History in Its Place

1277

in” model and allow for ongoing evolution of a path. A key argument in this strand of literature is that standard path dependence models say little about agency, about how economic and other actors create, recreate, and alter paths (Garud and Karnøe 2001; Garud et al. 2010). The complaint is that the standard perspective on path dependence is that of an “outsider’s ontology”: the emphasis is on unpredictable contingencies, external increasing returns effects, and self-reinforcing nonlinear dynamics, which determine the behavior of actors who, once locked in, cannot escape unless some exogenous shock occurs. It is as if (local) economic actors are subject to some “higher-order logic” or “master plan” we call path dependence, over which they have no control. In contrast, it is argued, there is a need for another perspective on path dependence that embraces an “insider’s ontology,” that is, one which recognizes and assigns central importance to the purposive actions and behavior of actors. Purposive action is not only often responsible for the initial creation of a new path – which in the standard path dependence model is typically regarded as a happenstance or random event, “outside the knowledge” of the observer, or, as Arthur (1994, p. 17) puts it, “beyond the resolving power of his [sic] ‘model’ or abstraction of the situation” – but also for how that path develops over time. Actors mobilize and draw upon the past (previous outcomes and experiences) in order to shape and fulfill their aspirations for the future: they may wish to repeat the past (to continue a particular form of activity) or to improve or move on from the past (by changing activity and behavior in some way). As Garud et al. (2010, p. 769) put it, “different visions of the future will lead to the mobilisation of the past in different ways. And these images of the future and mobilizations of the past will galvanise specific actions in the present.” Rather than “lock-in,” these authors argue, there is ever the possibility of creative destruction, with agents proactively innovating in order to move their activity forward under the pressure of competition and new opportunities. At the very least, economic agents learn, and contrary to the assumption made in the standard model of path dependence that learning leads to progressive imitation and to widespread adoption (i.e., “lock-in”), the assumption made by Arthur, learning can equally lead to more or less continual evolution of a path. Although some, including Garud and his coauthors, want to go as far as to argue that path creation and path dependence should be regarded as distinct, others (such as Sydow et al. 2009) view path dependence and path creation as complementary and argue that any process is driven by a mix of the two. This opens up the possibility of different forms and degrees of path dependence, according to the balance of this mix of processes. Even “locked-in” states depend on agents’ decisions and actions – in this case, to change nothing and continue as before, whereas in many instances, there will be at least some agents, or groups of agents, whose intentional actions (entering into business, withdrawing from business, undertaking innovation, upgrading or redeveloping products, and so on) have the net effect that an industrial or technological or local economic path will mutate over time. In other words, the heterogeneity of decisions and actions among heterogeneous actors is very likely to prevent “lock-in” from occurring and instead lead to the ongoing adaptation of a path.

1278

4

R. Martin

Toward a “Developmental–Evolutionary” Model of Path Dependence in the Spatial Economy

Elsewhere, I have suggested that these explorations into alternative perspectives on path dependence are highly suggestive for how we should think about the idea of path dependence as a model of spatial economic evolution (Martin 2010a). I am not arguing that the processes operating in local or regional economies are identical to those shaping the development and evolution of institutional forms, merely that there are analogous processes at work in the former that resemble those in the latter and that these are worth exploring and elaborating (see Martin 2012a). In the first place, while strong increasing returns effects may make for a self-reinforcing movement toward the “lock-in” or extension of a particular industrial path in a local or regional setting, other mechanisms analogous to “layering,” “conversion,” and “recombination” may make for mutation and adaptation of that path over time. New firms are created or added more or less continuously as a local industry grows and develops; they may be spin-offs from existing firms, entirely new ventures, or implants from outside the locality. At the same time, some existing firms fail or move out of the locality. The addition and subtraction of competing entities and the consequential change in the relative frequency of different entities in a system are key forces that generate variety, and variety is a fundamental principle of evolution. New firms in an industry are likely to employ more advanced techniques, offer competing and perhaps different variants of the industry’s product or products, have different productivity and innovation profiles, and so on. The balance between entry, exit, and survival of firms may vary, of course, as the industry develops and will be driven by a selection process that is determined, in large part, by the relative competitiveness of the firms in their relevant external markets. Like “layering,” the idea of “conversion” also has an immediate relevance in a local industrial context. Changes to the characteristics of existing entities of a system are a key evolutionary mechanism. In the economic geographic case, it would refer to the ongoing innovation by firms in the local industry – in terms of new products, techniques, business organization, and the like – in response to market opportunities, competitive pressures, knowledge spillovers, and similar stimuli. The entry of new firms that employ newer techniques, different variants of products, and so on, by adding new elements to the local economy (i.e., “layering”), may, in turn, exercise a demonstration effect or spillover effect on existing firms leading to the “conversion” (reorientation) of their activities. As in the case of institutional evolution, these local industry “layering” and “conversion” processes interact, and “conversion” may well entail “recombination”-type processes, whereby firms are able to draw upon some aspects of existing local economic resources, capabilities, and externalities (such as skilled labor, technology centers, and the like) to reconfigure and reorient their activities. To the extent that mechanisms of these sorts operate, the technological and product “portfolio” of the local industry as a whole can change more or less continuously over time. Furthermore, as these changes cumulate, then the network externalities that support and benefit the local firms in the industry will also change. The skills of the local labor force and the range of intermediaries, of

Path Dependence and the Spatial Economy: Putting History in Its Place

1279

suppliers, and of local supporting institutions – in fact, the whole gamut of local network externalities – may slowly evolve as the industrial path evolves. And driving the scale and direction of this evolution, of course, are the aspirations, reactions, and decisions of local actors, in firms, institutions, and other organizations. This perspective on path dependence also allows for processes of “branching” to occur, whereby new, related sectors of activity emerge from and develop alongside, and perhaps even eventually replace, existing activities. The nature of such branching, itself a mechanism for creating variety or diversity in the local economy, will be path dependent to the extent that the new activities draw on competencies, technologies, and knowledges transferred from existing firms and activities. There is growing evidence that local economic diversification, the emergence of new activities and specializations, is often shaped by the existing industrial structure of an area, both in terms of influencing the scope for such diversification and the form it takes. In this sense, some local economic structures may be more “enabling” environments for branching and diversification to occur than others, which might be “constraining” in this regard. The point is that not only may a given local industrial path evolve and adapt over time but that a local economy’s entire economic structure may evolve in a path-dependent manner. An evolutionary economic geography perspective would go further and argue that when local industrial diversity or variety is of the “related” kind (whether in terms of input–output linkages, similar inputs, complementary knowledges and technologies, overlapping product mixes, etc. – see Frenken et al. 2007), the scope for such “layering,” “conversion,” “recombination,” and “branching” processes is significantly enhanced. Another work on “economic complexity” has identified the ability to accumulate complementary productive capabilities (“complexity” of products, exports, and skills) over time as conducive to the emergence of new paths of development, which themselves add to those capabilities to develop yet more complexity (Hildago and Hausmann 2009). The image is again one of evolutionary path dependence–path creation. Conceptually, therefore, we can think of a “developmental–evolutionary” model of path dependence in which there is a recursive relationship between the sectoral, technological, and institutional structures of a local economy, on the one hand, and the processes that drive economic evolution as these operate within and upon the local economy, on the other (see Fig. 1; Martin 2010a). At any one moment in time, a local economy will consist of a particular population of firms and businesses in specific sectors of activity and specialization, characterized by a range of productive capabilities, technologies, and processes, employing workers with certain skills, and linked to and regulated by, to varying degrees, local institutional arrangements. This local industrial–technological–institutional ecology provides the setting, the context, within which various processes of economic evolution operate. These processes depend on the conditions and circumstances obtaining in the local economy itself but also on various external factors, such as competitive pressures, new market opportunities, linkages to extra-local firms, external technological developments, regulatory norms, economic policies, and the like. The range of such external factors and developments that are relevant to and the particular effect they have on a local economy will itself also depend, in part at least, on that locality’s existing economic

1280

R. Martin

External factors: Competitive pressures; technological developments; regulatory arrangements; etc

External factors stimulate or constrain local developmental processes Local economic structures:

Developmental processes:

Path evolution mechanisms:

Local economic structures:

Sectoral variety and industrial specializations Dominant technological processes and systems Institutional forms and arrangements Entrepreneurial culture Skill profile of workforce

˝Layering˝ processes (e.g., addition of new firms, and exit of old ones) ˝Conversion˝ processes (e.g., product and technological reorientation by firms) ˝Recombination˝ processes (e.g., reconfiguration of existing firms and industries into new, related activities)

Changes in population of firms Mutation of sectoral structure Technological innovation and adaptation Branching into new, related activities Direction determining processes Rate (of change) determining processes

Sectoral variety and industrial specializations Dominant technological processes and systems Institutional forms and arrangements Entrepreneurial culture Skill profile of workforce

Existing structures and resources both enable and constrain developmental processes

Developmental processes act as sources of economic evolution and change

Etc

Path dependent change in structures and resources which provide new context for development

Time

Fig. 1 A “developmental–evolutionary” model of local economic path dependence. (Based on Martin 2010a)

and technological structures. In combination, these local structures and external factors will stimulate, influence the scope for, shape the direction of, and constrain the mechanisms that make for change versus continuity within the firms and industries in the locality (the processes akin to “layering,” “conversion,” and “recombination” referred to above, namely, the local entry and exit of firms, the pace and direction of innovation by local firms, and the emergence of new or related activities). As the local economy’s sectoral and technological structure changes in response to these evolutionary processes, this then alters the ecology within which those processes operate, which then produces further change (or continuity) in the local economy’s development path, and so on. The recursive model set out in Fig. 1 is obviously a highly simplified representation of what in reality is a highly complex set of processes that can operate at different historical speeds across different firms and sectors and at different spatial scales. But the key idea behind this model of path dependence is that existing local economic and associated structures condition and influence the scope for and nature of developmental processes, which in turn shape the pace and direction of change in those structures, which in their turn feedback to condition and influence local

Path Dependence and the Spatial Economy: Putting History in Its Place

1281

economic developmental processes. Such a system is recursive and path dependent, but also potentially evolutionary. As such it offers a wider interpretation of path dependence than the “lock-in” model while encapsulating the latter as a special case (see Table 4). Not only the scope for but also the pace at which local industrial paths evolve will obviously vary from one industry to another and from place to place. The slower the process, the more it is possible to describe the path or locational pattern as conforming to the standard “lock-in” model of path dependence: the research task in such cases is to determine why a local industry has failed to adapt and evolve. The faster the adaptation process, the less the notion of “lock-in” seems appropriate, the more the path will mutate in one way or another, and the more it is possible to describe the process as one of “developmental–evolutionary” path dependence. Allowing for purposive and intentional action by economic agents, for the normal ongoing processes of firm population dynamics, and for innovation, competition, and entrepreneurship sets a “developmental–evolutionary” model of path dependence apart from the standard conception. A “developmental–evolutionary” model is more admissive of the complex range of actual evolutionary paths that are found among industries, technologies, and local and regional economies. Such a model also provides a richer perspective on the issue of new path creation. The standard model of path dependence does not have much to say about how, or where, new industrial–technological paths come into being, other than ascribing the emergence of new industries or technologies, and their locational geographies, to random, happenstance, or serendipitous events. Witt (2003) has questioned the validity, or at least the generality, of this “virgin market” idea, the assumption that the emergence of a new technology, product, or industry, and any competition with other emergent rivals, takes place without reference to and uninfluenced by inherited market conditions. Likewise, a “virgin landscape” assumption, the idea that where a new industry or technology emerges is unrelated to preexisting regional industrial and technological structures, can be challenged. There is a curious contradiction in the standard path dependence model, in that path dependence seems to matter only once a new industry or technology has emerged but plays no part in influencing where it emerges. In fact, there is growing evidence in economic geography that the inherited, preexisting industrial structure of a region or locality often does have an influence on whether a new industrial path emerges or develops there. A “developmental–evolutionary” model of path dependence, by giving recognition to developmental processes such as “layering,” “conversion,” and “recombination” and how these are the outcomes of the purposive and intentional behaviors of local economic actors, admits of several possible mechanisms by which new local industrial pathways of economic development can emerge from preexisting ones. New paths can emerge from old. Local new path creation may thus itself form part of the developmental–evolutionary path dependence process: local preexisting capabilities, competencies, technologies, and knowledges can provide a resource base for local actors to deliberately venture into new or related fields. Similarly, a “developmental–evolutionary” model offers a wider perspective on how local industrial and technological paths come to an end. In the standard “lock-

1282

R. Martin

Table 4 The “lock-in” and “developmental–evolutionary” models of path dependence compared

1. Initial conditions

2. Contingencies

3. Selfreinforcing mechanisms

4. Lock-in

“Lock-in” model Inherited market and local economic and technological conditions assumed unimportant for creation of paths (“virgin market” assumption and “windows of locational opportunity” assumed open) Exogenous and manifest as unpredictable, nonpurposive, and somewhat random events Assumed key and given. “Systemic” mechanisms and processes that compel and constrain local agents’ decisions, which are largely beyond their control Progressive convergence to a particular spatial distribution of an industry or local industrial structure or specialism, which is assumed to be an equilibrium state and unchanging until disrupted by an external shock

5. Path plasticity

While sometimes acknowledged (e.g., David 2007), lock-in to a stable outcome or path generally assumed to be the norm

6. Path decline/ destruction

Assumed that some sort of (external) shock is necessary to “de-lock” a local industrial or technological path (state). Little discussion of gradual or long-run processes of decline of an industry or technology

7. New path creation

Not well theorized. What and where new industries and technologies emerge are contingent events

“Developmental–evolutionary” model Inherited and constructed. Preexisting market and local conditions can enable or constrain new economic and technological possibilities. Previous and existing paths condition possibilities for new ones Emergent and serving as the embedded contexts for ongoing action by agents May or may not be present locally. Not essential for path dependence and can be strategically manipulated by local (and extra-local) actors Lock-in (and equilibrium) is a special case. Most industrial location patterns and local industries undergo some form of ongoing change, adaptation, and evolution, both within industry (products, processes) and between industries (changing structural composition of local economy) Local industrial and technological paths can and do evolve incrementally. Actors constantly seek improved and new product and process opportunities, and the cumulation of such behavior imparts mutation to an industry or specialism Local industries and technologies display various types (and speeds) of relative and absolute decline. Causes of decline involve the complex interaction of exogenous and endogenous factors (e.g., failure or slowness of local firms to innovate and adapt in the context of changing market and competitive conditions) Constructed. New paths are typically the outcome of purposive and deviating behavior by agents, often influenced by or dependent on preexisting local conditions. May involve branching into new related activities (continued)

Path Dependence and the Spatial Economy: Putting History in Its Place

1283

Table 4 (continued)

8. Model of economic evolution

“Lock-in” model Punctuated equilibrium, whereby phases of stability (lock-in) are periodically disrupted by exogenous shocks (e.g., introduction of new technology)

“Developmental–evolutionary” model Mutational and adaptive, allowing for incremental as well as discontinuous change. Endogenous evolution arising from the birth and death of firms, learning, the creation and transfer of knowledge across firms and industries, and processes of competitive selection among firms

Note: In compiling Table 4, I have been influenced by Garud et al. (2010). However, their comparison is between the “lock-in” model of path dependence and what they regard as a distinctly different perspective on industrial and technological development which they call “path creation.” Since path dependence is an ongoing (re)creative process, driven by the activities and decisions of agents, these authors’ counterposition of path dependence and path creation is not perhaps that helpful. I prefer to follow Sydow et al. (2009) here, in seeing path dependence and path creation as inextricably interlinked and interactive, in an ongoing developmental–evolutionary process

in” model, it is assumed that a path is broken by some unpredictable external “shock.” Although such shocks can and do occur and can certainly undermine or disrupt a local industry and perhaps lead to its decline and demise, to attribute the decline of an industrial or technological path invariably to the impact of some unexpected or unpredictable exogenous shock is not especially enlightening. To be sure, a local industry is not a closed system and is subject to a variety of external pressures (and new opportunities). But such pressures and challenges are more or less constant features of modern economic life and not necessarily spasmodic, infrequent events. What matters, therefore, is the nature of the pressures that impinge on a local industry and how the industry reacts to them, which, in turn, depends on the industry’s resilience and adaptability (Hassink 2010; Martin 2012b). Furthermore, the decline of a local industrial path may arise endogenously, for example, because of the exhaustion of innovation by local firms, which then become uncompetitive and decline, so the industry shrinks. It may also occur if local firms switch to a different, perhaps related, sector of activity on a new path that is perceived as affording more profitable opportunities. Martin and Sunley (2006) suggested a number of possible mechanisms by which a local industrial path may be disrupted or even destroyed, most of which revolve around the interaction of exogenous and endogenous forces. Yet, their analysis, like that of Castaldi and Dosi (2006) and, indeed, like many of those in economic geography, was founded on the assumption that the problem is one of identifying the mechanisms by which a “locked-in” stable state can be “de-locked.” But if “lock-in” never occurs, clearly a different conceptualization is needed as to how industrial development paths lose momentum, atrophy, and decline. In several respects, then, a “developmental–evolutionary” model of path dependence differs significantly from the standard “lock-in” model. Whether it be in terms

1284

R. Martin

of the initial conditions that influence when and where a new industry or technology emerges, or the mechanisms that give rise to its path-dependent development across space, or the processes by which new industries replace old, or the type of local industrial evolution that is implied, the “developmental–evolutionary” model would seem more encompassing of the range of local economic evolutionary trajectories actually encountered. Furthermore, it incorporates the standard or strict “lock-in” model of path dependence as a special case. Thus, the notion of path dependence should be seen as a complex process that can take varying forms and produce varying rates of local industrial–technological change and evolution. In fact, the “developmental–evolutionary” model focuses attention precisely on why industrial paths vary in their evolvability (capacity to generate variety, of products, technologies, and indeed whole new industries) and adaptability and why these processes vary from one local economy to another.

5

Conclusion

According to Boschma and Frenken (2006, p. 280–281), “evolutionary theory deals with path dependent processes, in which previous events affect the probability of future events to occur” (emphasis in the original). The original formulations of path dependence in economics focused on a model that defined the process as the progressive “lock-in” – to a stable state – of particular technologies. This model has been taken up in various disciplines, including economic geography, where it has been used to explain the emergence and self-reinforcing spatial localization of industries. However, the “lock-in” model represents a very restricted model of spatial economic evolution. In recent years, this restricted definition or interpretation has been increasingly questioned. While in some applications, the argument has been made that the concept of path dependence should be very strictly defined in terms of “lock-in” by endogenous increasing returns effects (e.g., see the non-spatial discussion of institutional change by Rixen and Viola 2015), in evolutionary economics itself, there have been dissenting voices about such a narrow conceptualization. Thus, Witt (2003, p. 124), a leading evolutionary economist, has argued that the notion of “lock-in” is antithetical to industrial and technological evolution: [S]ome doubts should be raised about the plausibility of both the theoretical underpinnings of, and the empirical evidence for, technological or industrial ‘lock-in’. . . sooner or later there will always be new rivals who threaten the market dominance of a technology or variant. The erosion of market dominance under competitive pressure by new technologies supports Schumpeter’s empirical generalisation that an incessant process of creative destruction characterises modern industrial capitalism. (Witt 2003)

The same argument must surely apply to local and regional economies. The issue, then, is whether the notion of path dependence should be narrowly restricted to and

Path Dependence and the Spatial Economy: Putting History in Its Place

1285

reserved for situations of true “lock-in,” as recently argued by Vergne and Durand (2010), or whether the concept can be meaningfully widened to incorporate processes and systems – including local economies – that display ongoing developmental evolution. Adherents of the narrow, “lock-in” view will no doubt see this idea of a “developmental–evolutionary” model of path dependence as too broad and lacking definitional precision and analytical formalism, perhaps even not as path dependence at all. The problem with this reaction is not only that the empirical applicability of the standard or canonical “lock-in” model would seem to be limited, since many industries and technologies – and most local and regional economies – exhibit varying degrees of ongoing development and adaptation, but that such ongoing development is itself characterized by some degree of path dependence and thus requires conceptualization: we need a framework for analyzing how and why local economies differ in the rate and direction of pathdependent adaptation and evolution. What precise form such a framework will take is a task for future research, but it will entail linking path dependence much more closely with other evolutionary concepts used in economic geography. The “developmental–evolutionary” model proposed above offers considerable scope in this direction. It also entails methodological deliberation. One of the possible attractions of the standard “lock-in” model of path dependence, especially perhaps to economists, is that it can be given a formal (i.e., mathematical and equilibrium) representation (as outlined in Table 1). But formalism can come at a cost: it can close off (“lock-out,” one is tempted to say) empirical patterns and outcomes that do not fit the prevailing model yet which, if subjected to appreciative theorizing, could well suggest other interpretations and generalizations that are more relevant. The “developmental–evolutionary” perspective on spatial economic path dependence suggested here is intended to encourage greater appreciative theorizing from concrete cases in order precisely to widen the applicability of the notion. In this sense, future work on path dependence in the spatial economy might well resemble the new generation of “history-friendly” models that are being pioneered in evolutionary economics, in which it is explicitly recognized that industries and technologies can take a variety of evolutionary paths and where appreciative theorizing from concrete case studies is used to construct more formal models that take full account of that variety (see, e.g., Malerba 2010). Ultimately, general formal models of path dependence must be complemented by investigation of the actual economic, technological, and institutional histories of the regions or localities under study if we are to grasp the nature, role, and outcomes of the path dependence processes we are trying to interpret. At the same time, a purely descriptive historical account of a region’s or locality’s economic development is inadequate without some attempt to situate that account within a theoretical framework. How to elaborate a meaningful general theory of spatial–economic path dependence that simultaneously allows for the place-specific context and particularities of path dependence is arguably one of the key challenges for future research in this field.

1286

6

R. Martin

Cross-References

▶ Clusters, Local Districts, and Innovative Milieux ▶ Endogenous Growth Theory and Regional Extensions ▶ New Economic Geography After 30 Years ▶ Regional Growth and Convergence Empirics ▶ Relational and Evolutionary Economic Geography

References Allen T, Donaldson D (2018) The geography of path dependence, Working Paper, Economics Faculty, MIT Arthur WB (1989) Competing technologies, increasing returns, and ‘lock-in’ by historical events. Econ J 99:116–131 Arthur WB (1994) Industry location patterns and the importance of history. In: Arthur WB (ed) Increasing returns and path dependence in the economy. Michigan University Press, Michigan, pp 49–68 Boas TC (2007) Conceptualising continuity and change: the composite-standard model of path dependence. J Theoret Polit Sci 19:33–54 Boschma R, Frenken K (2006) Why is economic geography not an evolutionary science? J Econ Geogr 6(3):272–302 Boschma R, Martin RL (2007) Constructing an evolutionary economic geography. J Econ Geogr 7(5):537–548. Special Issue: Evolutionary Economic Geography Boschma R, Martin RL (eds) (2010) Handbook of evolutionary economic geography. Edward Elgar, Cheltenham Castaldi C, Dosi G (2006) The grip of history and the scope for novelty: some results and open questions on path dependence in economic processes. In: Wimmer A, Kössler R (eds) Understanding change: models, methodologies and metaphors. Palgrave Macmillan, London, pp 99–128 Coe N (2011) Geographies of production, 1: an evolutionary revolution? Prog Hum Geogr 35(1):81–91 David PA (1985) Clio and the economics of QWERTY. Am Econ Rev 75(2):332–337 David PA (1986) Understanding the economics of QWERTY: the necessity of history. In: Parket WN (ed) Economic history and the modern economics. Blackwell, Oxford, pp 30–49 David PA (2005) Path dependence in economic processes: implications for policy analysis in dynamical systems contexts. In: Dopfer K (ed) The evolutionary foundations of economics. Cambridge University Press, Cambridge, pp 151–194 David PA (2007) Path dependence: a foundational concept for historical social science. Cliometrica 1(2):91–114 Frenken K, Van Oort FG, Verburg T (2007) Related variety, unrelated variety and regional economic growth. Reg Stud 41:685–697 Garretsen H, Martin RL (2010) Rethinking (new) economic geography models: taking geography and history more seriously. Spat Econ Anal 5(2):127–160 Garud R, Karnøe P (2001) Path creation as a process of mindful deviation. In: Garud R, Karnøe P (eds) Path dependence and creation. Lawrence Erlbaum, London, pp 1–38 Garud R, Kumaraswamy A, Karnøe P (2010) Path dependence or path creation? J Manag Stud 47(4):760–774 Grabher G (ed) (1993) The weakness of strong ties: the lock-in of regional development in the Ruhr area. The embedded firm. Routledge, London, pp 255–277

Path Dependence and the Spatial Economy: Putting History in Its Place

1287

Hassink R (2010) Regional resilience: a promising concept to explain differences in regional economic adaptability? Camb J Reg Econ Soc 3(1):45–58 Hildago C, Hausmann R (2009) The building blocks of economic complexity. Proc Natl Acad Sci 106:10570–10575 Krugman P (1991) History and industrial Location: The Case of the US Manufacturing Belt. Am Econ Rev 81:80–83 MacKinnon D, Cumbers A, Pike A, Birch K, McMaster R (2009) Evolution in economic geography: institutions, political economy, and adaptation. Econ Geogr 85(2):129–150 Mahoney J (2000) Path dependence and historical sociology. Theory Soc 29:507–548 Mahoney J (2006) Analyzing path dependence: lessons from the social sciences. In: Wimmer A, Kössler R (eds) Understanding change: models, methodologies and metaphors. Palgrave Macmillan, London, pp 129–139 Mahoney J, Schensul D (2008) Historical context and path dependence, Ch. 22. In: Tilly C (ed) The Oxford Handbook of contextual political analysis. Oxford University Press, Oxford, pp 448–471 Malerba F (2010) Industry evolution and history–friendly models. Plenary paper, international schumpeter society conference on innovation, organisation, sustainability and crisis, Aalborg, 21–24 June 2010. http://www.schumpeter2010.dk/index.php/schumpeter/schumpeter2010/ paper/viewFile/491/208. Accessed June 2011 Martin RL (2010a) Roepke lecture in economic geography – rethinking regional path dependence: beyond lock-in to evolution. Econ Geogr 86(1):1–27 Martin RL (2010b) The ‘new economic geography’: credible models of the economic landscape? In: Lee R, Leyshon A, McDowell L, Sunley P (eds) The sage companion to economic geography. Sage, London, pp 53–72 Martin RL (2012a) (Re)Placing path dependence: a response to the debate. Int J Urban Reg Res 36(1):179–192 Martin RL (2012b) Regional economic resilience, hysteresis and recessionary shocks. J Econ Geogr 12(1):1–32 Martin RL, Sunley PJ (2006) Path dependence and regional economic evolution. J Econ Geogr 6(4):395–437 Martin RL, Sunley PJ (2011) Conceptualising cluster evolution: beyond the life cycle model? Reg Stud 45(10):1295–1318, (with P.J. Sunley) Special Issue on Cluster Life Cycles, Eds. R. Boschma and D. Fornahl Metcalfe JS, Foster J, Ramlogan R (2006) Adaptive economic growth. Camb J Econ 30(1):7–32 Oosterlynck S (2012) Path dependence: a political economy perspective. Int J Urban Reg Res 36(1):158–165 Page S (2006) Path dependence. Q J Polit Sci 1:87–115 Rixen T, Viola LA (2015) Putting path dependence in its place: towards a taxonomy of institutional change. J Theor Polit 27:301–322 Schneiberg M (2007) What’s on a path? Path dependence, organizational diversity and the problem of institutional change in the US economy, 1900–1950. Soc Econ Rev 5:47–80 Setterfield M (1998) Rapid growth and relative decline: modelling macroeconomic dynamics with hysteresis. Macmillan, London Setterfield M (2009) Path dependency, hysteresis and macrodynamics. In: Arestis P, Sawyer M (eds) Path dependency and macroeconomics. Palgrave Macmillan, London, pp 37–79 Sydow J, Schrëyogg G, Koch J (2009) Organisational path dependence: opening the black box. Acad Manag Rev 34(4):689–709 Vergne J, Durand R (2010) The missing link between the theory and empirics of path dependence: conceptual clarification, testability issues and methodological implications. J Manag Stud 47(4):736–759 Witt U (2003) The evolving economy. Edward Elgar, Cheltenham

Agglomeration and Jobs Gilles Duranton

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Cities, Worker Productivity, and Wages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Firm Dynamics Within Cities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 City Functionality, Urban Systems, and Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1290 1291 1296 1300 1304 1305 1305

Abstract

This chapter discusses the literature on agglomeration economies from the perspective of jobs and job dynamics. It provides a partial review of the empirical evidence on agglomeration externalities; the functionality of cities; the dynamic relationship between cities, jobs, and firms; and the linkages between cities. We provide the following conclusions. First, agglomeration effects are quantitatively important and pervasive. Second, the productive advantage of large cities is constantly eroded and needs to be sustained by new job creations and innovations. Third, this process of creative destruction in cities, which is fundamental for aggregate growth, is determined by the characteristics of urban systems and broader institutional features. We highlight important differences between devel-

Gilles Duranton also affiliated with the National Bureau of Economic Research and the Center for Economic Policy Research. G. Duranton (*) The Wharton School, University of Pennsylvania, Philadelphia, PA, USA e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_33

1289

1290

G. Duranton

oping countries and more advanced economies. A major challenge for developing countries is the transformation of their urban systems into drivers of economic growth. Keywords

Agglomeration economies · Local labor markets · City growth · Urbanization

1

Introduction

This chapter reviews the literature on agglomeration economies from the perspective of jobs and labor markets. In cities, jobs are more productive because of agglomeration effects. These take place through a variety of channels: resource sharing, quicker and better matching, and greater knowledge spillovers. Section 2 provides a discussion of these issues. The bottom line is straightforward: cities have a positive effect on productivity and wages. More productive urban jobs however do not come in a void. Section 3 broadens the discussion to job creation and firm dynamics in cities. More productive jobs in cities need to be created. Innovation, entrepreneurial activity, and firm growth all play a crucial role in this respect. Adding to this, more productive jobs do not remain more productive forever. This productivity advantage is constantly eroded and needs to be constantly re-created. The creative destruction process, that is, more firm entry and exit and higher portion of innovative young firms, is also fundamental. In turn, the dynamics of firms and jobs in cities is shaped by the broader characteristics of urban systems. In Sect. 4, we highlight major differences between cities in developing countries and more advanced economies. In short, the urban system of many developing countries acts as a brake on economic growth. A major challenge for the countries is the transformation of their urban systems into drivers of economic growth. More specifically, cities in developing countries appear to be far less functionally specialized than cities in more advanced economies. This hampers the dynamism of the largest cities in developing countries which are burdened by many ancillary activities. These activities add to urban crowding without adding to agglomeration benefits. Better infrastructure, in particular better transportation infrastructure, and a reduction in favoritism towards large cities may be a way to remedy these problems. Policies to foster job creations directly may be tempting, but their record in more advanced economies is unsatisfactory. In addition, developing cities also function less efficiently and face challenges that differ from those of cities in more advanced economies. An appropriate management of the transition to full urbanization, a strengthening of urban governance, a reduction in labor market duality, and a reduction or the full elimination of land market duality are key challenges that must be tackled for developing cities to take the full advantage of agglomeration effects and foster aggregate growth.

Agglomeration and Jobs

2

1291

Cities, Worker Productivity, and Wages

Cities enjoy a productive advantage over rural areas, and this advantage is larger for larger cities. The positive association between various measures of productivity and urban scale has been repeatedly documented. That larger cities obtain higher scores on many productivity metrics from wages to output per worker, or the total factor productivity of firms is now beyond doubt. Most of the studies reviewed by Combes and Gobillon (2015) find an elasticity of wages or firm productivity with respect to city employment or urban density between 0.02 and 0.10. As shown by Duranton (2015), these findings also hold widely in cities in developing countries. More formally, this type of work involves regressing an outcome variable by location on a measure of agglomeration. In the early literature, the typical regression of choice involved using output per worker as dependent variable and city population as explanatory variable. In the early 1990s, authors often employed more indirect strategies and started to use variables such as employment growth or firm creation as outcome measures. More recently, the literature has moved to microdata and returned to more direct outcome measures, namely, the total factor productivity of firms and wages. More precisely, recent studies estimate a regression like log wicðiÞ ¼ α log PopcðiÞ þ ηcðiÞ þ ui þ eicðiÞ ,

(1)

where c denotes cities and i denotes individuals or groups of individuals. The dependent variable is w the wage, and the explanatory variables are log Pop the log of population as a measure of urban scale, η a city effect (usually proxied through a number of control variables at the city level), and u an individual effect (often proxied through observable individual characteristics). Finally, e is an error term. The estimated value of the coefficient of interest, α, is usually positive and significant. Similar regressions can be proposed for firm data using measures of firm level productivity and firm characteristics. After Ciccone and Hall (1996), density has often been favored relative to population since it appears to yield more reliable results. The reason is probably that density-based measures of agglomeration are more robust to zoning idiosyncrasies. For instance treating Washington and Baltimore as one big consolidated metropolitan area or two separate cities makes a big difference to their employment count but only little difference to density. After asserting this robust statistical association between productivity outcomes and agglomeration, the first question regards whether the estimated coefficient α in the regression described by Eq. (1) reflects the causal effect of agglomeration on wages. An examination of Eq. (1) reveals three possible sources of bias. They all come from the fact that, as highlighted by the notations in Eq. (1) above, the measure of agglomeration is indexed by ci), that is, the city c is chosen by worker i. Ideally, one would like to compare the same workers across the cities that they have chosen and those that they have not chosen. In absence of randomized experiments, this is not possible. Greenstone et al.’s (2010) quasi experiment on “million dollar plants” is what comes closest to this ideal for firms’ location choices.

1292

G. Duranton

The first source bias is the possible link between city effects (which are not observed directly) and the variable of interest, city population or density. Put differently, the “quantity of labor” may be endogenous and it is reasonable to expect workers to go to more productive cities. A possible solution to this problem is to use instruments for city population or density as Ciccone and Hall (1996). These instruments need to predict current population patterns but must be otherwise uncorrelated with city productivity. Deep historical lags such as population from 200 years ago or oil characteristics are plausible instruments. Studies using this type of approach typically find that correcting for the endogeneity of population has only a mild downward effect on the estimation of the coefficient of interest α. The second main identification problem in the estimation of Eq. (1) regards a possible correlation between the measure of city population and individual effects. That is, the quality of labor may be endogenous and we expect more productive workers to reside in larger cities. A first possible solution to this problem is to control for an extensive set of individual characteristics. A more drastic solution is to use (whenever possible) the longitudinal dimension of the data and impose worker fixed effects as Combes et al. (2008) and subsequent literature. The endogenous quality of labor seems to be an important source of bias in the estimation of Eq. (1). The estimated value of α is typically reduced by 30–50% using extensive individual controls or worker effects. This said, one needs to be careful. Imposing worker effects improves the quality of the estimation but it is not a perfect solution since it assumes that mobility is exogenous. Related to this last issue, the third source of bias in the estimation of Eq. (1) is the possibility of a correlation between the error term and the measure of city population of interest. If, for instance, workers move more easily from large cities to small cities than the opposite in case of a good external wage offer, this will create another source of bias which in this particular situation leads to an underestimate of agglomeration economies. No satisfactory solution to this problem has been proposed so far. At this point, the conclusion of the agglomeration literature is that there is a causal static effect of cities and urbanization on wages in more advanced economies but that this effect represents only about half the measured association between city population or density and wages (or alternative measures of productivity). The rest of the association between population or density and wages reflects the sorting of more productive workers in larger and denser cities, and to a lesser extent, reverse causality and workers moving to more productive places. Recent investigations that tackle the concerns mentioned above find agglomeration elasticities around 2%. They thus suggest rather modest static effects of cities on productivity. The literature from developing countries often uses less sophisticated approaches but finds results that are comparable and, if anything, indicative of moderately stronger agglomeration effects. After questioning its causal aspect, the second key question about the estimation of agglomeration effects regards their sources. When asking about the “sources” of agglomeration, the literature frequently confuses two separate questions. The first is about which markets are affected by these agglomeration effects and the second is

Agglomeration and Jobs

1293

about which mechanisms actually occur. Regarding the “where” question, it is customary to distinguish the markets for (intermediate) goods, the market for labor, and the (absent) market for ideas and knowledge. In terms of mechanisms, we often distinguish between sharing, matching, and learning mechanisms. “Sharing” is about the many possible benefits from the mutualization of specialized input providers, the diversity of local goods, the division of labor, or the risks. “Matching” is about the greater probability of finding another party such as a worker, an employer, a supplier, or an investor and the greater quality of the match with that party. Finally, “learning” is about the better generation, diffusion, and accumulation of knowledge. The latter set of mechanisms is regularly referred to as knowledge spillovers. Because of the wide variety of possible mechanisms and the markets where they can take place, the literature that investigates the sources of agglomeration benefits is much more heterogeneous than the literature that attempts to measure the overall benefits from agglomeration. The latter naturally coalesces around the estimation of Eq. (1). First, there is a diversity of work which provides evidence of an association between some aspect of agglomeration such as a particular mechanism or market and measures of agglomeration such as city size. Let us take only a few recent examples (see Combes and Gobillon (2015), for a more exhaustive discussion). Taken together, these studies are suggestive that many of the agglomeration mechanisms described by the theoretical literature are at work in a variety of markets. This conclusion must be taken cautiously, however. Establishing the direction of causality in this type of work is even harder here than when attempting to measure the overall effects of agglomeration. To understand this point and the pitfalls associated with this type of work, let us use the analysis of Charlot and Duranton (2004) on workplace communication. They show that communication is associated positively with city size and with wages. This leads them to conclude that communication spillovers could account for up to a quarter of agglomeration benefits. However, this finding could be explained in part by the greater sorting of good communicators in larger cities. This is the equivalent of the quality-of-labor bias discussed above. This worry can be reduced by comparing movers and stayers in cities as Charlot and Duranton (2004) do. It is difficult to eliminate it entirely though. In addition, one also needs to show that greater communication in cities is not the by-product of another agglomeration force. Workers in larger cities may communicate more because firms outsource more of their output. This requires some co-ordination. In such a case, the real source of agglomeration benefits may be input-output linkages, not communication spillovers. To go round this problem, Charlot and Duranton (2004), who use rich firm level data, suggest instrumenting workplace communication by measures of organizational changes such as a flattening of the hierarchy. These changes typically increase the need for horizontal communication. This type of instrument is nonetheless valid only if changes in organization are unrelated to other sources of agglomeration benefits such as labor pooling or input-output linkages. That firm reorganization affects worker communication behavior but has no direct effect on recruiting practices or outsourcing is

1294

G. Duranton

plausible but not certain. More generally, studies that focus on one particular source of agglomeration face a major missing variable problem: The other sources of agglomeration are absent from the regression even though they are expected to be correlated with both wages (or other productivity measures) and measures of agglomeration such as city size. Given how difficult it is to measure many aspects of agglomeration and given also that the list of possible agglomeration sources is open, considering all sources of agglomeration in one regression is not a feasible option. A more reasonable path forward is, following Ellison et al. (2010), to consider several classes of agglomeration sources in the same approach. Ellison et al. (2010) assess how much labor pooling, input-output linkages, and spillovers account for co-agglomeration between industries in the USA. They use a measure of industry co-agglomeration and find more co-agglomeration among (i) industries that buy from each other, (ii) industries that use a similar workforce, and (iii) industries that share a common scientific base. To reduce the possibility that co-agglomerated industries end up buying from each other or using similar workers because of their proximity, they instrument their US measures of input-output linkages and labor pooling using corresponding UK data. Of course, if the biases are the same in the UK as in the USA, these instruments are of limited value. Another caveat is that input-output linkages are possibly more easily measured using input-output matrices than spillovers using patent citations. This can also lead to biased estimates since a positive correlation with both linkages and spillovers is likely to be picked up mainly by the better-measured linkage variable. This said, Ellison et al. (2010) confirm that the three motives for agglomeration they consider are at play with input-output linkages playing a more important role. Even if we abstract from the uncertainty around those results, the notion that several mechanisms, each operating in several markets, contribute to agglomeration benefits is problematic for policy. At their heart, agglomeration benefits rely on market failures associated with the existence of small indivisibilities with sharing mechanisms, thick market effects with matching mechanisms, and uncompensated knowledge transfers with learning mechanisms. That is, there are possibly many market failures at play in many markets. In turn, this implies that there may be no hope of fostering agglomeration economies through a small number of simple policy prescriptions. Before broadening the discussion, there are four further features of agglomeration that have implications for workers and jobs in cities. The first is the issue of the sectoral scope of agglomeration and whether agglomeration effects accrue mostly within or across sectors. Agglomeration effects within sectors are referred to as localization economies and between sectors as urbanization economies. When estimating a more general version of Eq. (1) that accounts for both city size or density and the degree of same sector specialization, extant research has found evidence of both localization and urbanization effects. There are two interesting nuances. The first is the presence of significant heterogeneity across industries. This heterogeneity follows an interesting pattern as it appears that more technologically advanced industries benefit more from urbanization economies whereas more

Agglomeration and Jobs

1295

mature industries benefit more from localization economies. Second, the calculations of Combes et al. (2008) indicate that in France the benefits from localization economies are smaller than those of urbanization economies and mostly uncorrelated with local wages. Put differently, increased local specialization has only small benefits and does not contribute to making workers richer. The second extra feature of agglomeration is the notion that not all workers benefit equally from urban scale. Equation (1) estimates an “average” agglomeration effect. As highlighted by, among others, Glaeser and Resseger (2010) agglomeration effects appear stronger for more educated workers in the USA. Higher returns in larger cities should in turn provide stronger incentives to more skilled workers to locate there. Hence, these results are consistent with the well-documented fact that workers in larger cities in more advanced economies tend to be more educated and better skilled (e.g., Combes et al. 2008). Next, while not all workers benefit equally to agglomeration effects, it also appears that not all workers contribute equally to these effects either. There is a large literature on human capital externalities suggesting that workers enjoy higher wages when surrounded by more educated workers. Estimates of external returns to education are typically between 50 and 100% the corresponding estimates of private returns to education, in particular for university graduates. These findings are robust to a number of estimation concerns and suggestive of large effects. It is beyond the scope of this chapter to review this literature extensively. See instead Combes and Gobillon (2015) for an in-depth survey. Finally, there is also emerging evidence from US and European data that wage growth also depends on city size/density. To show this, one can estimate a regression along the lines of Eq. (1) but use wages in first difference instead of in levels as dependent variable: Δtþ1,t logwicðiÞ ¼ α log PopcðiÞt þ ηcðiÞ þ ui þ eicðiÞt ,

(2)

where Δ is used to note time differences between t and t + 1. Among a number of papers, De la Roca and Puga (2017) confirm that wage growth is stronger in larger cities. Because the structure of Eq. (2) is the same as that of Eq. (1) for the static estimation of agglomeration economies, it suffers from the same drawbacks. First, the association between wage growth and agglomeration could be explained by the sorting of workers with faster wage growth in larger cities. This could occur because “fast learner” tend to locate in larger cities or because the wage of workers who are predominantly located in larger cities (such as more educated workers) tends to increase faster. Although the result that wages grow faster in cities is frequently interpreted as evidence about faster learning in cities and knowledge spillovers, the mechanisms that drive it are unclear. Just like regressing wages in levels on a measure of urban scale in Eq. (1) does not tell us anything about the sources of static agglomeration economies, regressing wage growth on urban scale in Eq. (2) is equally uninformative about the sources of agglomeration dynamics. Interestingly, Wheeler

1296

G. Duranton

(2008) shows that young workers tend to change job more often in larger cities, while the opposite holds for old workers. This type of evolution is consistent with a matching model where workers can find their “ideal match” faster in larger cities and then stick to it. Such mechanism could explain both faster wage growth and eventually higher wages in larger cities. Evidence about learning in cities can come from the fact that workers retain some benefits from agglomeration after they leave their city. De la Roca and Puga (2017) confirm this on Spanish data. Their findings suggest the existence of both a level effect of cities on wages (of the same magnitude as those discussed above) and a dynamic effect. Over the long run, workers in large cities seem to gain about as much from both effects. To sum up, this discussion of agglomeration economies which focuses mainly on workers and jobs reaches a number of interesting conclusions. First, larger cities make workers more productive. There is both a static and a dynamic component to these gains. A static elasticity of wages with respect to city population of 0.03 implies that a worker receives a 23% higher wage when moving from a tiny city with population 5,000 to a large metropolis with a population of five million. Over time, dynamic effects could make this urban premium twice as large. While long-run gains close to 50% are not miraculous, there are nonetheless sizeable. In terms of policy implications, the temptation to “foster agglomeration effects” should be resisted. We are too far from knowing enough about the sources of agglomeration to implement any meaningful policy, not to mention the great heterogeneity in who gains from and who contributes to agglomeration gains. It remains nonetheless that the economic gains from urbanization are significant and urbanization should be embraced rather than resisted.

3

Firm Dynamics Within Cities

This higher productivity of jobs in cities is only one facet of the issue. Jobs are usually viewed as a veil when we model production in theoretical models. In practice, higher labor productivity is associated with doing different things and doing them differently. That is, to receive higher wages, workers need “better jobs”. Firm dynamics is often the vector of these changes. More specifically, let us examine several aspects of firm dynamics in cities: innovation, firm creation and growth, and factor allocation and reallocation across firms. Starting with innovation, the first salient feature of the geography of innovative activity is that research and innovation is much more concentrated than production in most industries. Interestingly, this tendency seems particularly strong for industries that are more intensive in skilled labor and in research and development. It is also the case that this concentration of research and development typically takes place in large metropolitan areas. These location patterns for innovative activity are consistent with the notion that cities have a positive effect on innovation just like they have on wages. More direct evidence can be found in Feldman and Audretsch (1999) and Carlino et al. (2007).

Agglomeration and Jobs

1297

To measure innovation, Feldman and Audretsch (1999) make a count of all new product innovations in US metropolitan areas for a broad set of technologies and sectors in 1982. They find no evidence of urban scale effects but find that same sector specialization is strongly negatively associated with innovation whereas a diversity of employment in technologically related industries is strongly positively associated with innovation. They also find strong positive innovation effects associated with the presence of smaller establishments. Using the number of patents per capita as dependent variable, Carlino et al. (2007) find evidence of strong agglomeration effects for innovative activity. Their estimate of the elasticity of patenting per capita with respect to employment density is 0.2. This is several times the estimates reported above for the corresponding elasticity of wages. Interestingly, Carlino et al. (2007) also find that this elasticity of innovation with respect to employment density or population size is not constant across the urban hierarchy. Patenting per capita appears to peak at around 5,700 jobs per square kilometer or a city population size slightly below a million. While this evidence is highly suggestive that cities affect innovation, there is, to the best of our knowledge, no work which focuses on the effects of innovative activity in cities such as its effects on urban growth (Duranton and Puga 2014). Regressing urban population growth on innovative activity would raise some obvious identification concerns. In addition, a simple theoretical argument suggests that the effect of innovation on urban growth need not be positive. Obviously, product innovation in the form of either an entirely new product or the capture of an established product from another location is expected to add to a city’s employment. However, process innovation within a city can cut both ways. Employment will increase with process innovation only if greater productive efficiency and lower prices lead to a more than proportional increase in demand. In the opposite case, process innovation will imply a contraction of local employment. Remarkably, Carlino et al. (2007) show that Rochester, Buffalo, Cleveland, St Louis, and Detroit are all highly innovative cities. This suggests that, to some extent, the demise of these cities may be attributed to the fact that labor productivity increased much faster than demand in their industries. Finally, innovative activity appears to change the nature of jobs in the cities where it takes place. As shown by Lin (2011), cities that patent more tend to have a greater proportion of what he labels “new work”, that is jobs that did not exist a few years before. New work is also fostered by a greater proportion of educated workers and a diversity of industries, two other attributes of large cities. To conclude on the links between innovation and cities, extant literature supports the notion that cities affect innovation either because of their sheer population size or because of the (diverse) structure of their production activities. The evidence about the effect of innovation on cities is more complex. Innovation within a given city affects the proportion of workers in new work. Other effects are either ambiguous or poorly documented. As we show below, further insights about the effects of innovation on cities can be gained by looking across cities. Entrepreneurship is also closely associated with cities in several ways. First, cities affect entrepreneurship just like they affect wages and innovation. In a

1298

G. Duranton

comprehensive analysis of the determinants of employment in new manufacturing start-ups across sectors in US cities, Glaeser and Kerr (2009) generate a rich harvest of facts. The first is the existence of scale economies. As a city grows larger, employment in new start-ups in this city increases more than proportionately. Depending on their specification, Glaeser and Kerr (2009) find an elasticity of employment in new start-ups per capita with respect to city scale between 0.07 and 0.22. City population, city-industry employment, and sector effects explain around 80% of the variation in start-up employment across cities and sectors. Glaeser and Kerr (2009) also find that the presence of many small suppliers has a strong effect on employment in start-ups. In addition, they also find evidence of mild Marshallian effects associated with input-output linkages, labor market pooling, and spillovers. Finally, city demographics only have a limited explanatory power just like their measure of “entrepreneurial culture”. The other key feature about the supply of entrepreneurs is that there is a strong local bias in entrepreneurship. Entrepreneurs tend to create their start-up in the place where they were born and/or where they have lived and worked before becoming entrepreneurs. This important fact has been documented by Figueiredo et al. (2002) for Portugal and Michelacci and Silva (2007) for Italy and the USA. This finding has been confirmed by several other studies in developed economies. Figueiredo et al. (2002) also show that when entrepreneurs chose a new location, this choice is strongly governed by agglomeration economies and a proximity to large cities. After looking at the urban determinants of entrepreneurship, we now turn to the effects of entrepreneurship on their cities. It has been shown repeatedly that entrepreneurship plays a key role in urban evolutions. The key fact here is that growth in a city and sector over a period of time is strongly correlated with the presence of small establishments in that city and sector at the beginning of the period. This fact was first documented by Glaeser et al. (1992) and has been confirmed for other countries and time periods by many other studies. Just like with many of the correlations discussed above, the strong link between small firms and employment growth raises a key identification concern about the direction of causality. Unfortunately, this issue has been neglected by the literature. This is perhaps because the standard regression uses growth over a period as dependent variable and establishment size at the beginning of the period as explanatory variable. However, using a pre-determined variable as explanatory variable in a regression does not guarantee its exogeneity. Local entrepreneurs could enter in large numbers in a city and sector if they foresee strong future demand. That expectations of future growth should trigger entry today is only natural. That is the nature of business. To resolve this identification problem, it is difficult to think of instruments that would predict establishment size in a city and sector but be otherwise uncorrelated with subsequent growth. To clarify the meaning of the relationship between small establishments and high subsequent growth, Glaeser et al. (2010) do something quite different. They look at whether the presence of many small firms in a city and sector is driven by the demand for entrepreneurship or its supply. To the extent that the demand for entrepreneurship can be captured by higher sales per worker, this does

Agglomeration and Jobs

1299

not appear to be the case. They also find limited evidence about the importance of lower labor costs or entrepreneurs sorting into high-amenity cities. They find stronger evidence about the importance of the proportion of university graduates (particularly in more skilled industries) but that still does not explain away the effect of having lots of small establishments. While still preliminary, this type of evidence points at some unspecified supply effects. More entrepreneurial cities happen to have a greater supply of entrepreneurs and the literature has thus far been unable to trace this further. Turning finally to factor allocation and reallocation, the literature that examines these issues makes two important claims. The first is that a large fraction of productivity growth at the country level can be accounted for by the reallocation of factors from less productive to more productive firms. A large share of productivity growth can be accounted for by a churning process where low-productivity firms are replaced by new and more productive start-ups. These important findings have been confirmed for many countries (Bartelsman et al. 2004). The second important claim made by the reallocation literature is that “misallocation” can account for a large share of existing productivity differences across countries. To understand this point better, consider the influential work of Hsieh and Klenow (2009). They first note that, in equilibrium, the marginal product should be equalized across firms. If the demand for the varieties produced by firms has a constant elasticity of substitution, this implies an equalization of the product of their price by their “true productivity” (which is the ability of firms to produce output from inputs, also referred to as Quantity-TFP). This – price times true productivity – product is what is estimated as “total factor productivity” in most productivity exercises. We may call this second quantity “apparent productivity” (or RevenueTFP). Obviously, the firms’ apparent productivities are never equalized in real data. Hsieh and Klenow (2009) interpret this as evidence of factor misallocation. Taking the highly dispersed distribution of manufacturing productivity in China and India, they calculate very large potential costs from such misallocation. Acknowledging that a perfectly efficient allocation may be impossible, they compute that the productivity gains for manufacturing in China and India would still be of about 50% if their level of misallocation could be reduced to that observed in the USA. To the best of our knowledge there is no study that would attempt to relate greater churning/reallocation at the firm level and higher productivity growth at the urban level. However, there is a strong suspicion that larger cities should exhibit more churning. This is because, as already argued, larger cities are more innovative, experience more entry and exit, and have a greater fraction of their workforce in “new work”. At the same time, there is no indication that this greater amount of churning in larger cities is associated with higher productivity growth in those cities unlike what occurs at the country level. We actually know little about productivity growth in cities. According to Lin (2011), the greater proportion of workers employed in new work in larger cities is not associated with faster productivity growth. In a rare study of the broader determinants of productivity growth in Italian cities, Cingano and Schivardi (2004) highlight the importance of both specialization and employment size. But

1300

G. Duranton

given that specialization and employment size are negatively correlated, their positive effects arguably cancel out. Hence more churning does not appear to lead to faster productivity growth in cities. To confirm this conclusion, note that workers are somewhat mobile across cities. Then more churning associated with faster productivity growth in larger cities should imply a divergence in population growth rates. There is no evidence of such divergence. This lack of result regarding the link between churning and productivity should not be taken as negative evidence against the reallocation literature. As argued in the next section, it is possible that reallocation does not take place within cities but also across cities. Turning to the second claim about misallocation, Combes et al. (2012) show that the distribution of firm productivity is unambiguously more dispersed in larger cities in France. In the framework of Hsieh and Klenow (2009), that would be interpreted as greater misallocation in larger cities. This seems hard to believe. The evidence about static agglomeration effects discussed above is instead best interpreted as agglomeration economies leading to a better allocation of resources (in a broad sense) in larger cities. When performing a productivity decomposition, Combes et al. (2012) find a similar covariance between establishment size and productivity in large and small cities which suggest a similar level of efficiency in the allocation of factors to firms across cities of all sizes. To sum up, the evidence about firm dynamics and cities presented in this section is puzzling. Larger cities seem to be more innovative, be more entrepreneurial, experience more churning and reallocation, and generally enjoy a greater “economic dynamism”. At the same time, they do not appear to enjoy most of the benefits associated with such dynamism since neither productivity nor population appears to increase faster in larger cities. Of course, these conclusions need to be taken cautiously given the paucity of study, including their complete absence for cities in developing countries.

4

City Functionality, Urban Systems, and Policies

The answer to the apparent puzzle raised above is that when thinking about economic growth it is wrong to think of cities as self-contained units. Cities are best viewed as small open economies which interact a lot with other cities and rural areas. They are part of an “urban system”. This implies that innovation, churning, and reallocation are best studied across the entire system of cities, not within a single city. Starting with innovation, recall that larger cities offer many advantages for both product and process innovation. More specifically, as highlighted by many, cities favor the circulation and cross-fertilization of ideas. This naturally leads to more product innovations and this is consistent with the evidence of Feldman and Audretsch (1999) discussed above. For process innovation, Duranton and Puga (2001) underscore the greater availability of intermediate goods in large cities, which allows firms to proceed through trial and error at a faster pace. Put differently, the greater ability of larger cities to innovate may just be another manifestation of

Agglomeration and Jobs

1301

agglomeration economies. The key difference with many static aspects of agglomeration economies such as thicker local labor markets is that, with dynamic effects, co-location is not needed all the time. More precisely, spillovers may matter to develop an innovation, but after this is done, co-location is no longer needed. Quite the opposite, larger cities are more expensive places to produce. After the dynamic benefits from agglomeration have been exploited, it can make sense for firms to relocate. Often, the entire firm does not need to relocate since it is only the production of particular products that is concerned. Patterns of establishment relocations in France are highly consistent with this type of product cycle. As shown by Duranton and Puga (2001) about 75% of French establishment that relocated did so from a city with above-median diversity to a city with below-median diversity and above median specialization in the same sector. In addition, as documented by Fujita and Ishii (1998), large Japanese multinationals in the electronic sector produced their newest products in “trial” plants near Tokyo and Osaka. Previous generations of products were produced in rural locations in Japan and in less advanced countries in the Asia. Hence, as their products mature, firms still search for agglomeration economies but will put a greater weight on the benefits of specialization. Large cities act as nurseries for new goods and new products. Once mature, new goods and products are best produced in more specialized places. Cities also specialize by sector. However this tendency, while still present in the data, has diminished over time as documented by Duranton and Puga (2005). The same authors also document a rise in the functional specialization of cities with the emergence of cities specialized into management type functions whereas others specialize more into production activities. This rise in functional specialization is rationalized by Duranton and Puga (2005) in a model where lower communication costs make it easier for firms to separate management from production. Since these activities benefit from very different types of agglomeration economies, such separation is beneficial provided the cost of separating activities is low enough. In turn, this separation of activities reinforce the functional specialization of cities. These multiple dimensions of specialization are part of well-functioning urban systems in more advanced countries. Adding to this, the notion of cities being specialized by functions and activities is not static. The process of continuous location and relocation of economic activity is a crucial aspect of the growth of those activities. To take a simple example, when George Eastman developed a new revolutionary technology in the photographic industry in Rochester, the latter relocated from New York to Rochester. Then, much later, as the technology developed by Eastman got itself superseded by the digital revolution, Rochester lost its status of capital of the photographic industry. That different cities specialize into different functions and are able to change their specialization after negative shocks presupposes a fair amount of “mobility” across cities. The first important dimension of mobility regards goods and services. It would make little sense for cities to narrowly specialize in an activity if its output cannot be exported. Continuously changing patterns of specialization also require labor mobility. For instance, Kerr (2010) documents that after “breakthrough” innovations, more innovations tend to take place in the same location for the same technology. This

1302

G. Duranton

growth in patenting, in turn, depends on the mobility of scientists and engineers. Interestingly, the adjustment appears faster in the USA for technologies that depend more heavily on immigrant inventors who are more mobile. While the foregoing discussion describes well what happens within the urban system of more advanced economies, it is a far less resemblant depiction of the situation of cities in developing countries. For instance, most very large cities in developing countries are still major manufacturing centers whereas manufacturing production is mostly absent from the largest cities of Europe and North America. This lack of urban differentiation may be at the root of the problem. Urban systems in developing countries may be much less efficient than in more advanced countries because cities are much less differentiated in terms of functions. More specifically, this lack of differentiation in urban functionality may hamper the dynamism of cities in developing countries. The largest cities there are burdened by many ancillary activities such as basic manufacturing, and call centers. These add to urban crowding without adding to agglomeration benefits. On the other hand, smaller cities in developing countries often lag far behind and getting some of these ancillary activities would be crucial for their development. This said, a lack of well-functioning urban systems – however important (and neglected in urban policy) – is not the only cause for the lower efficiency of cities in developing countries relative to their counterparts in advanced economies. Non-urban factors such as weak national institutions, and poor technology certainly play a role. Urban factors which hinder the functional differentiation of cities also have a direct negative effect on the efficiency of cities. For instance, as we discuss below, high transportation costs limit the specialization of cities by reducing their ability to trade. At the same time, even if we abstract from these effects, high transportation costs also affect the price of goods purchased by local consumers and reduce market access for local producers. In the rest of this section, we examine a number of urban factors that both reduce the efficiency of the urban system as well as the efficiency of cities directly. Cities in developing countries are often acting as a brake on growth, whereas they should be a key driver of economic development. The first key difference between cities in developing and more advanced countries regards the functioning of their labor market. In most developing countries, there is a well-known duality in the labor market, which usually comprises a large informal sector alongside the formal sector. Aside from its detrimental implications for workers in the informal sector, this duality hinders urban development in several ways. First, it has been accused of inducing too much migration towards the largest cities where most of the formal sector is located. Duality may also limit mobility across cities since jobs in the informal sector tend to be filled by word-of-mouth through social connections which are missing to newcomers. High barriers to “good” jobs in the formal sector may also hold back the incentives of workers to improve their skills locally and thus limit the scope of agglomeration benefits. To mitigate the effects of labor market duality, three broad types of policies can be envisioned. The first is to improve the working of labor markets. While this objective

Agglomeration and Jobs

1303

is certainly laudable, a discussion of this class of policies would certainly go beyond the scope of this paper. The second type of policy is to foster local job creation through “place-based” policies. Such policies typically involve subsidies, tax holidays, or, perhaps more importantly, other exemptions from, for instance, labor laws, within some areas. These areas are sometimes referred to as “special economic zones”. Place-based policies in developing countries also often offer some form of training or capacity building. In more developed countries, these policies often aim at helping welldelineated but particularly poor areas with great emphasis on fiscal tools. In developing countries, they often try to foster the development of more advanced areas, sometimes entire regions, through a wide variety of tools with a special emphasis on giving access to factors of production and a more stable institutional framework. These differences aside, the literature about these place-based policy suggests limited returns from these interventions (see Duranton and Venables (2018), for a survey). The third class of policies attempts to foster job creations in a particular locality by helping firms in a given sector. These policies are usually referred to as “cluster” policies and follow from the work of Michael Porter (1990). They often entail the development of subsidized supportive institutions and infrastructure using public subsidies and various types of fiscal incentives. The review of the literature in Combes and Gobillon (2015) invites skepticism regarding the possible benefits of cluster policies. The second key difference between cities in developing and more advanced countries regards the functioning of their land market. Like labor markets, land markets in developing cities are characterized by a duality between land used with appropriate property titles and leases and squatted land. Recent empirical research has focused on the effects of the lack of effective, formal property titles which could prevent residents of squatter settlements from using their house as collateral. Informal land markets may thus be a major barrier to enterprise development. The empirical evidence about the relaxation of credit constraints associated with “titling” policies is weak. Recent work points instead to increases in labor supply (Field 2007) and to the adoption of more middle-class values and attitudes (Di Tella et al. 2007). While this evidence about titling policies is relatively optimistic about the merits of such policies, it must be noted that the existing literature focuses nearly exclusively on residential land. The extent of land illegality for commercial and production purposes (from illegal street vendors to squatter manufacturing) is poorly measured, and the solutions to these problems are not well developed. The third key difference between cities in developing and more advanced countries regards infrastructure, particularly the road infrastructure. Two strands of research need to be distinguished here. The first finds its roots in international trade and focuses on the estimation of the effect of “market potential” variables. The market potential of a city is usually computed as the sum of the income (or population) of other cities weighted by their inverse distance to the city under consideration. Assuming transportation costs and other trade frictions associated with distance, many models of international and inter-regional trade generate the

1304

G. Duranton

prediction that a location’s income will be determined by its market access (Krugman 1991). The literature offers strong empirical support regarding the importance of market access for cities in developing countries (Donaldson 2015). The second strand of literature focuses more closely on the effects of infrastructure. Baum-Snow’s (2007) pioneering work finds that the construction of the interstate highway system was a major impetus behind the suburbanization of US cities. Duranton and Turner (2012) also find that more kilometers of interstate highways in US metropolitan areas in the early 1980s led to faster population growth over the subsequent 20 years. This type of approach is also being applied to developing countries. In a remarkable piece of work, Donaldson (2018) documents the effects of the construction of India’s railroad network by its colonial power. He shows that railroads increased trade and reduced price differences across regions. Even more importantly railroads increased real incomes and welfare. To minimize identification problems, he compares the network that was built to other networks that were considered but never developed. In line with some of the arguments advanced above about the importance of transportation infrastructure for the decentralization of manufacturing activity away from large metropolises, Baum-Snow et al. (2017) underscore the importance of railroads in the decentralization of manufacturing production in China. Storeygard (2016) provides evidence about the importance of inter-city transportation costs for inland African cities. Using new roads data for Africa and satellite data (“lights at night”) to estimate economic activity, he assesses the effect of higher transportation costs. To circumvent the endogeneity of transportation costs (roads may be built to access growing cities) he uses arguably exogenous variations in oil prices. He finds an elasticity of economic activity with respect to transportation costs of about 0.2. All these findings are suggestive of the profound and long lasting effects of major transportation infrastructure (see Redding and Turner (2015), for a review). One needs to keep in mind nonetheless that major transportation networks are extremely costly investments. The last key difference between cities in developing and more advanced countries regards the effects of the favoritism by governments of the largest cities. While the reasons for primate city favoritism are still debated (Henderson 2005), there is little doubt that such favoritism takes place in many different ways. As argued in Henderson (2005), primate city favoritism harms the favored primate city by making it bigger than it should be. It also harms smaller cities which are, in effect, heavily taxed. The gap that is created between the primate city and other cities may also have negative dynamic effects since for most educated workers there is nowhere to go except stay in this primate city. As a result this may reduce the circulation of knowledge across cities. Reducing primate city favoritism and providing smaller cities with better local public goods (including education and health) are certainly an important part of any solution.

5

Conclusions

For individual workers, cities in developing countries appear to bring significant benefits both in the short run and in the long run. However, when taking a broader look, the urban system of developing countries appears to involve far less functional

Agglomeration and Jobs

1305

differentiation across cities than in more advanced economies. Such differentiation with different cities playing different roles in the urban system is important for the process of growth and development to proceed smoothly. Larger cities innovate and manage. Smaller cities often produce a narrow range of goods. Having larger cities do everything like they often in developing countries reduces their dynamism and holds back small cities which remain stagnant. A variety of policies can be envisioned to solve this problem. The three more promising areas are: general policies to improve the functioning of labor markets, ending primate city favoritism, and promoting the development of major infrastructure to connect cities.

6

Cross-References

▶ Changes in Economic Geography Theory and the Dynamics of Technological Change ▶ Cities, Knowledge, and Innovation ▶ Commuting, Housing, and Labor Markets ▶ Factor Mobility and Migration Models ▶ Geography and Entrepreneurship ▶ New Economic Geography and the City ▶ The Geography of Innovation

References Bartelsman E, Haltiwanger J, Scarpetta S (2004) Microeconomic evidence of creative destruction in industrial and developing countries. Processed, University of Maryland Baum-Snow N (2007) Did highways cause suburbanization? Q J Econ 122(2):775–805 Baum-Snow N, Brandt L, Henderson JV, Turner MA, Zhang Q (2017) Roads, railways, and decentralization of Chinese cities. Rev Econ Stat 48(3):435–448 Carlino GA, Chatterjee S, Hunt RM (2007) Urban density and the rate of invention. J Urban Econ 61(3):389–419 Charlot S, Duranton G (2004) Communication externalities in cities. J Urban Econ 56(3):581–613 Ciccone A, Hall RE (1996) Productivity and the density of economic activity. Am Econ Rev 86 (1):54–70 Cingano F, Schivardi F (2004) Identifying the sources of local productivity growth. J Eur Econ Assoc 2(4):720–742 Combes P-P, Gobillon L (2015) The empirics of agglomeration economies. In: Duranton G, Henderson V, Strange W (eds) Handbook of regional and urban economics, vol 5A. Elsevier, Amsterdam, pp 247–348 Combes P-P, Duranton G, Gobillon L (2008) Spatial wage disparities: sorting matters! J Urban Econ 63(2):723–742 Combes P-P, Duranton G, Gobillon L, Puga D, Roux S (2012) The productivity advantages of large cities: distinguishing agglomeration from firm selection. Econometrica 80(6):2543–2594 De la Roca J, Puga D (2017) Learning by working in big cities. Rev Econ Stud 84(1):106–142 Di Tella R, Galliani S, Schargrodsky E (2007) The formation of beliefs: evidence from the allocation of land titles to squatters. Q J Econ 122(1):209–241 Donaldson D (2015) The gains from market integration. Annu Rev Econ 7(1):619–647

1306

G. Duranton

Donaldson D (2018) Railroads of the Raj: estimating the impact of transportation infrastructure. Am Econ Rev 108(4):899–934 Duranton G (2015) Growing through cities in developing countries. World Bank Res Obs 39 (1):39–73 Duranton G, Puga D (2001) Nursery cities: urban diversity, process innovation, and the life cycle of products. Am Econ Rev 91(5):1454–1477 Duranton G, Puga D (2005) From sectoral to functional urban specialisation. J Urban Econ 57 (2):343–370 Duranton G, Puga D (2014) The growth of cities. In: Aghion P, Durlauf S (eds) Handbook of economic growth, vol 2. North-Holland, Amsterdam, pp 781–853 Duranton G, Turner MA (2012) Urban growth and transportation. Rev Econ Stud 79(4):1407–1440 Duranton G, Venables AJ (2018) Place-based policies for development. National Bureau of Economic Research Working Paper 24562 Ellison G, Glaeser EL, Kerr WR (2010) What causes industry agglomeration? Evidence from coagglomeration patterns. Am Econ Rev 100(3):1195–1213 Feldman MP, Audretsch DB (1999) Innovation in cities: science-based diversity, specialization and localized competition. Eur Econ Rev 43(2):409–429 Field E (2007) Entitled to work: urban property rights and labor supply in Peru. Q J Econ 122 (4):1561–1602 Figueiredo O, Guimaraes P, Woodward D (2002) Home-field advantage: location decisions of Portuguese entrepreneurs. J Urban Econ 52(2):341–361 Fujita M, Ishii R (1998) Global location behavior and organizational dynamics of Japanese electronics firms and their impact on regional economies. In: Chandler AD Jr, Hagström P, Sölvell Ö (eds) The dynamic firm: the role of technology, strategy, organization and regions. Oxford University Press, Oxford, pp 343–383 Glaeser EL, Kerr WR (2009) Local industrial conditions and entrepreneurship: how much of the spatial distribution can we explain? J Econ Manag Strateg 18(3):623–663 Glaeser EL, Resseger MR (2010) The complementarity between cities and skills. J Reg Sci 50 (1):221–244 Glaeser EL, Kallal H, Scheinkman JA, Shleifer A (1992) Growth in cities. J Polit Econ 100 (6):1126–1152 Glaeser EL, Kerr WR, Ponzetto GAM (2010) Clusters of entrepreneurship. J Urban Econ 67 (1):150–168 Greenstone M, Hornbeck R, Moretti E (2010) Identifying agglomeration spillovers: evidence from winners and losers of large plants openings. J Polit Econ 118(3):536–598 Henderson JV (2005) Urbanization and growth. In: Aghion P, Durlauf SN (eds) Handbook of economic growth, vol 1B. North-Holland, Amsterdam, pp 1543–1591 Hsieh C-T, Klenow PJ (2009) Misallocation and manufacturing TFP in China and India. Q J Econ 124(4):1403–1448 Kerr WR (2010) Breakthrough inventions and migrating clusters of innovation. J Urban Econ 67 (1):46–60 Krugman PR (1991) Increasing returns and economic geography. J Polit Econ 99(3):484–499 Lin J (2011) Technological adaptation, cities, and new work. Rev Econ Stat 93(2):554–574 Michelacci C, Silva O (2007) Why so many local entrepreneurs? Rev Econ Stat 89(4):615–633 Porter ME (1990) The competitive advantage of nations. Free Press, New York Redding SJ, Turner MA (2015) Transportation costs and the spatial organization of economic activity. In: Duranton G, Henderson V, Strange WC (eds) Handbook of regional and urban economics, vol 5B. Elsevier, Amsterdam, pp 1339–1398 Storeygard A (2016) Farther on down the road: Transport costs, trade and urban growth in sub-Saharan Africa. Rev Econ Stud 83(3):1263–1295 Wheeler CH (2008) Local market scale and the pattern of job changes among young men. Reg Sci Urban Econ 38(2):101–118

Changes in Economic Geography Theory and the Dynamics of Technological Change Riccardo Crescenzi

Contents 1 2 3 4

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Linear Model of Innovation: The A-Spatial Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Physical “Distance” Between Innovative Agents and Knowledge Flows . . . . . . . . . . . . . . . . Innovative Agents “in Context”: Local Specialization Patterns and Institutions . . . . . . . . . 4.1 Economic Places: Industrial Specialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Relational-Institutional Places . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Bringing Different Approaches Together (1): Non-spatial Proximities, “Integrated” Frameworks, and Local-Global Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Bringing Different Approaches Together (2): Local-Global Connectivity . . . . . . . . . . . . . . . . 7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1308 1310 1311 1313 1313 1314 1316 1318 1319 1322 1322

Abstract

This chapter looks at the economic geography literature and sets out to explore the evolution of its intersections with innovation theories. The replacement of the linear model with more sophisticated conceptualizations of the process of innovation has made it possible to account for persistent disparities in innovative performance across space and has motivated researchers to incorporate the role of space and places in the analysis of innovation processes. From the physicalmetrical approach of geography as distance to the emphasis on specialization and diversification patterns (geography as economic place), institutional-relational factors, non-spatial proximities, and “integrated” frameworks, economic geography theory has substantially evolved. The research frontier has moved onto the analysis of the territorial innovation impacts of global flows of capital, skills, knowledge, and ultimately value (through global value chains) and their implications for the rationale, design, and implementation of innovation policies. R. Crescenzi (*) Department of Geography and Environment, London School of Economics, London, UK e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_35

1307

1308

R. Crescenzi

Keywords

Innovation · Geography of innovation · Regional innovation · Linear model · Systems of innovation · Innovative cities · Global connectivity · Global value chains (GVCs) · Innovation policy

1

Introduction

In an increasingly globalized world of intensified competition with ever-shorter product life cycles, innovation is a driver of regional and national competitiveness. This is certainly good news for developing and emerging countries and regions: economic performance can be boosted by stronger indigenous innovative capabilities but also by better accessibility to external knowledge. New windows of opportunity are being opened by innovation and technological change for new actors to emerge in the international technological competition arena. However, a large body of empirical evidence suggests that these opportunities are far from “universal”: knowledge generation and absorption are highly localized, and diffusion follows very complex (and ever-changing) patterns of flows of capital, skills, knowledge, and value (through global value chains). In both developing and developed economies, a small number of “hotspots” are pushing the technological frontier forward, followed by a set of emerging second-tier “imitative systems” and a large number of territories that exhibit little innovative dynamism and only marginal benefits from technological opportunities. Innovation is certainly spreading both internationally – as suggested, for example, by the success of China and India – and “nationally,” with new territories gaining momentum in the “new” member states of the EU, but only in a very circumscribed set of new suitable “locations.” This is true in Europe and the United States where around 70% of total patenting remains concentrated in the 20 most innovative regions but also in China and India where these concentration patterns are even more significant (Crescenzi and Rodríguez-Pose 2011). Rather than waning, such spatial innovation disparities are increasing in both developed and developing countries, shattering hopes that rapid progress in information and communication technologies (ICT) and the dismantling of barriers to the movement of labor and capital can automatically strengthen innovative performance from previous localized patterns of technological accumulation and contextual socio-institutional and geographical conditions. Conversely, the spatial concentration of knowledge-generation in a few leading “hotspots” boosts their attractiveness for inward investment in innovative activities, further reinforcing the localization of the key nodes of “global” knowledge networks generated by the mobility of both capital (e.g., by multinational firms and their intra- and inter-firm connectivity; see Crescenzi and Iammarino 2017) and skilled labor (Crescenzi and Gagliardi 2018), generating a cumulative self-reinforcing process. Technological change and innovation – with their capability to generate new economic opportunities – are features of cities, clusters, and regions whose

Changes in Economic Geography Theory and the Dynamics of Technological. . .

1309

contribution toward national and global systems and networks is highly asymmetric. This, therefore, calls for appropriate frameworks of understanding able to capture the two-way nexus between geography and innovation. Coherently with this perspective, this chapter aims to critically review the existing literature on territorial innovation dynamics in order to shed light on how progressively more sophisticated conceptualizations of the role of geography in innovation dynamics have been developed, and how they can address the complexity of the “real” world processes discussed above, accounting in a more effective manner for the local impacts of the ongoing processes of global integration and disintegration. When looking at how the literature has conceptualized the economic geography of innovation dynamics, it is possible to identify four major streams of literature (Crescenzi 2014): (i) Being based on physical-metric space, the first stream of literature has analyzed the role of geographical distance between innovative agents in shaping their innovative capabilities. (ii) The second stream, instead, has focused on geography as an “economic place,” looking at how local sectoral and functional specialization patterns shape the generation of innovation. (iii) A third set of contributions has concentrated on “institutional-relational places,” looking at the impact on innovation of the rules and patterns shaping the interactions between innovative agents in a given locality. (iv) The final set of academic works has developed the idea that economic and institutional-relational processes can be decoupled from geographical proximity giving rise to alternative (“economic” and/or “institutional-relational”) nonspatial proximities, global networks, and value chains. Following the foregoing categorization, this chapter starts off by briefly reviewing the archetypical a-spatial approach, i.e., the linear model of innovation. The linear sequencing from basic into applied research and innovative products or processes leaves no conceptual room for geographical dynamics. The subsequent section looks at the literature that abandons the view of knowledge as a public good in order to explore the role of physical geographical distance in making knowledge a local quasi-public good. The fourth section places innovation “in context” by discussing (a) the influence of economic places – local agglomeration and specialization patterns – on the innovation process, by looking at how economists and geographers have tried to identify the type of sectoral specialization that is most conducive to innovation, and (b) the role of local institutions which is analyzed in the fourth section by reviewing the literature on regional systems of innovation (RSI) where the focus is on institutional-relational places. The fifth section will review recent research based on the multidimensional conceptualizations of proximity that broaden the analytical focus to nonspatial proximities and global connectivity as determinants of local innovative performance. The final section concludes with some reflections of public policies for innovation.

1310

2

R. Crescenzi

The Linear Model of Innovation: The A-Spatial Benchmark

The linear model of innovation has for a long time been the most influential theoretical framework for the understanding of the economic impact of science and technology. It postulates that all innovations result from basic science (Godin 2006): conducted in the research laboratories of universities and government research institutions, basic science produces new knowledge that is passed on to the applied science laboratories of private companies, where it is prepared for the translation into commercial products. The linear model conceives the innovation process as a one-way path: Basic science ➔ Applied science ➔ Development ➔ Commercialization and Diffusion. This view also implies that basic science creates positive externalities in the form of public knowledge: underinvestment in basic research must be expected in the absence of government intervention. The allocation of public resources to basic science is expected to maximize externalities that allow for the universal diffusion of knowledge as a public good. Empirically, the reasoning behind the linear model lies at the core of econometric studies examining the link between R&D and patents, in the first instance, followed by that between patents and economic growth. These analyses are based on knowledge production functions (KPFs), which allows for an investigation of the causal relation between productivity growth, unobservable knowledge capital, its observable input (R&D) as well as output (typically patents), and further factors. Based on firm-level data, these studies are mostly conducted by “mainstream economists.” The linear model of innovation has been particularly influential in the postwar when it shaped the US science and technology policies and remains popular with policy-makers in the twenty-first century, as evidenced by targets in terms of R&D spending to GDP ratios set in the EU’s Lisbon Agenda or by the contemporary policy focus on centers of excellence that still survives in the innovation policies of several countries. Two major reasons explain the lasting influence of the linear model. Firstly, the model conveys an unequivocal normative message: Policymakers should invest in basic research to maximize innovative potential. Second, national statistical offices and international organizations have reified “basic science,” “applied science,” and “development” into standard categories for the collection of data on innovative efforts, hardening the model as a concrete reference for policy discussions and transformed the linear model into a “social fact” (Godin 2006). The most fundamental critique to linear approach aims at the core of the model, i.e., its linear character. The latter has been criticized for failing to reflect the complexity of innovation processes and the heterogeneity of its dynamics. Kline and Rosenberg (1986, p.285) argue that “innovation is neither smooth nor linear, nor often well behaved.” These critics consider the production of new technological knowledge an interactive process between multiple agents. Since this process is assumed to involve continuous feedback, the advocates of this view reject the linear model’s conceptualization of innovation processes as a one-way sequence of steps.

Changes in Economic Geography Theory and the Dynamics of Technological. . .

1311

The creation of new knowledge is a socially embedded, interactive process. It is shaped by the interactions between innovative agents that, in their turn, are fundamentally influenced by physical space (which can facilitate or hamper their contacts) and by the places in which they are embedded being part of local industrial specialization processes, technological trajectories, and institutional modes of innovation.

3

Physical “Distance” Between Innovative Agents and Knowledge Flows

Once the view of knowledge as a pure public good – at the basis of the linear model – is replaced by a more realistic appreciation of its actual scope, geography as physical distance immediately becomes a fundamental component for the understanding of innovation processes. Knowledge has only a few of a public good’s characteristics: it is non-rivalrous and only to a limited extent excludable. In this regard, the literature on the role of geographical distance in innovation processes shares some common ground with the a-spatial linear model which assumes that knowledge production gives rise to external economies in the form of public knowledge. However, while in the linear model the location of innovative agents’ location is irrelevant to their capability to benefit from these externalities, the geographical literature considers knowledge as a spatially bounded quasi-public good whose circulation is largely restricted within the functional borders of the area where it is generated. When looking at the spatial diffusion of knowledge flows, a crucial distinction is made between codified and tacit knowledge. The former is assumed to be relatively cheap to transfer since it can be expressed in a set of codes or instructions and distributed via communication channels (such as the Internet) and accessed by anybody familiar with the respective symbol system (e.g., language). Conversely, tacit knowledge is more expensive to transfer over long distances because – due to its higher complexity and context-dependency – it is not codifiable. The relatively high cost of transferring tacit knowledge across space renders this type of knowledge geographically “sticky,” making face-to-face (F2F) contact an economically efficient means for its transmission. Encompassing verbal, physical, non-intentional and intentional, as well as contextual elements, F2F contacts allow for the communication of complex, contextual messages and minimizes free rider problems by promoting the development of trust (Storper and Venables 2004). The importance of F2F contacts can be interpreted as a pivotal factor underlying the spatial clustering of innovative activities: The complexity and contextdependency of knowledge flows associated with innovative activities make the latter dependent on F2F – “an intrinsically spatial communication technology” (Crescenzi and Rodríguez-Pose 2011, p. 379). The dependency on F2F contacts may thus induce innovative actors to locate close to each other, which in turn leads to the emergence of geographical clusters of highly innovative agents. In line with this conceptualization, geographical distance plays a major role in innovation processes: geographical proximity is deemed to facilitate the

1312

R. Crescenzi

transmission of imperfectly appropriable but spatially sticky knowledge. Empirically, a large body of research on localized knowledge spillovers (LKS) examines the importance of geographical proximity for the dissemination of knowledge (for a review see Döring and Schnellenbach 2006 and Breschi 2011): shifting from firmbased KPFs to regions as units of observation, this stream of literature finds empirical support for the relevance of geographically mediated knowledge spillovers and identifies evidence of geographically bounded spillovers measuring their spatial extent (Döring and Schnellenbach 2006). A second stream of empirical literature has used patent citations to track the spatial diffusion of patented inventions, suggesting that patent citations display a high degree of spatial autocorrelation: inventors refer to previous patents originating in the same city more frequently than to a control group. When it comes to the design of regional innovation policies, the consideration that geographical distance acts as a barrier for the diffusion of knowledge flows leads to the acknowledgment of geographical peripherality as a source of structural disadvantage (Crescenzi and Rodríguez-Pose 2011). The emphasis on the spatial boundedness of knowledge flows may also be interpreted as warranting interventions aimed at minimizing the geographical distance between innovative actors in the public and private sector. Incubators and science parks are two examples of policy measures reflecting the idea that public policies can actively maximize spillovers promoting regional innovative output by providing infrastructure that allows for a spatial concentration of regional innovative activities. However, “classic” studies on LKS are often based on indicators that capture the potential for spatially bound knowledge spillovers rather than actual flows/contacts between agents. The mechanisms underlying the transmission of knowledge spillovers remain underdeveloped, meaning that the concept of LKS is still largely a “black box” (Döring and Schnellenbach 2006): while some authors suggest that market transactions rather than externalities may explain local knowledge flows, others point out that members of epistemic communities may be connected by ties that transcend geographical proximity. The insufficient understanding of how knowledge is actually transferred between individuals located in the same geographical area impedes the formulation of a clear normative message to policymakers. In response to these criticisms, other empirical work has focused more closely on the role of individuals as knowledge carriers and in particular on the mobility of knowledge-carrying workers and researchers. In addition the literature has explicitly acknowledged that innovative agents cannot rely exclusively on local knowledge assets. Highly innovative actors benefit from a combination of “local buzz” (Storper and Venables 2004) – i.e., the innovation-enhancing local environment based on frequent F2F contacts of individuals who are co-located in a confined, typically urban place – and “global pipelines,” i.e., communication channels formed by a differentiated set of “global” actors (different streams of literature have looked at multinational firms, diasporic communities, universities, and “star” scientists) that increasingly tap into pools of external knowledge bearing the associated communication cost/effort (Cantwell 2009; Saxenian 2006; Athelt and Cohendet 2014).

Changes in Economic Geography Theory and the Dynamics of Technological. . .

1313

Only the most recent developments in economic geography theory (reviewed in Sect. 5) will overcome this dichotomous (local vs. global) conceptualization of knowledge transmission mechanisms developing more sophisticated frameworks of understanding.

4

Innovative Agents “in Context”: Local Specialization Patterns and Institutions

4.1

Economic Places: Industrial Specialization

Geographical distance between innovative agents is an important predictor of knowledge exchange costs. The communication of economically valuable (potentially not codifiable\codified knowledge) across large distances is possible but at increasing costs. However, a number of other characteristics of the local environment generate incentives for knowledge exchange and shape the synergies for innovation generation. In this context, a vast literature has dwelt on the role played by specialization patterns by contrasting the innovation performance of both highly specialized and diversified economic environments that often coexist in both developing and developed economies. A high degree of specialization facilitates the exchange of specialized industryspecific knowledge. Occurring between firms active in the same industry, these Marshall-Arrow-Romer (MAR) knowledge spillovers are deemed to spur innovation. MAR spillovers are a typical feature of “classic” industrial districts (Amin 2003). Conversely, “Jacobian spillovers” are associated with a diversified economic fabric, which is often found in big cities. The most valuable sources of knowledge that a firm may benefit from lie outside its own industry, suggesting that a diverse industrial structure allows for cross-industry knowledge flows that induce recombinant innovation. The empirical literature suggests that both Jacobian and MAR externalities play an important part in enhancing innovation. Possibly due to differences regarding methodology and level of aggregation, analyses come to mixed, often conflicting results (for extensive reviews see Beaudry and Schiffauerova 2009 and De Groot et al. 2009). Although part of the literature suggests that only specialization can be conducive to innovation, it must be stressed that MAR and Jacobian spillovers are not mutually exclusive (Beaudry and Schiffauerova 2009). Indeed, large cities can be simultaneously specialized in one or more sectors and simultaneously display a diverse range of further industries. Specialization and diversification patterns have been harmonically combined into “economic places” by two sub-streams of literature. The first stream has combined specialization patterns with a product life cycle perspective (see Duranton and Puga 2014 for a review). Moving from a static to a dynamic view of the role of specialization patterns in the creation of new technological knowledge, innovation processes at different stages of the product life cycle rely on different types of knowledge spillovers. Firms develop new products in diversified urban contexts –

1314

R. Crescenzi

termed nursery cities – benefiting from access to a greater variety of knowledge sources so that they can test new combinations until they identify the ideal production technology. Once production technology is standardized, firms relocate to specialized places as the focus shifts from radical to incremental innovations, and the ability to exchange knowledge with other firms from the same industry becomes more beneficial than having access to knowledge from a wide range of sectors. In the nursery-city approach, both types of specialization patterns should coexist in a balanced system of cities, as they play different roles at different product life cycle stages (Duranton and Puga 2014). The second view that goes beyond the classic MAR versus Jacobian dichotomy proposes a more sophisticated understanding of sectoral diversity. The “related variety” approach concentrates on cognitive proximity between sectors (Content and Frenken 2016 for a review). Drawing on the notion of absorptive capacity, in a related variety framework knowledge will necessarily “spill over” between any pair of industries: the identification and absorption of new knowledge requires a pre-existing complementary knowledge. Related-variety industries share complementary competences. Intermediate levels of cognitive proximity between related industries facilitate intersectoral knowledge flows conducive to innovation. Accordingly, neither specialization nor diversity per se enhances innovation: the former may lead to a too narrow knowledge base, whereas the latter might involve a lack of complementary knowledge across sectors. Instead, the composition of sectors in a region should ideally display an intermediate level of cognitive proximity between the different industries. Hence, the perspective of related variety suggests that it is “diversity ‘in what’ that matters” (Iammarino 2011 p.148).

4.2

Relational-Institutional Places

While the specialization literature unquestionably abandons the a-spatial perspective of the linear model, it heavily concentrates on economic processes, essentially disregarding the institutional-relational dimension of territorial innovation processes. The concept of related variety does, however, share common roots with (regional) systems of innovation (Lundvall 1992) – the key components of institutional-relational places – and both streams are influenced by ideas from evolutionary economics and economic geography. The systems of innovation (SI) perspective considers knowledge production as a nonlinear, interactive, and socially embedded process. SI literature adopts a systemic perspective and considers the creation of new knowledge as the result of evolutionary processes in complex systems (for a review, see Asheim et al. 2019). Its emphasis on multiple feedback between innovative agents sharply contrasts with the linear model’s conceptualization of innovation as a one-way process. While in the linear model there are only three major types of innovative actors (corresponding to the categories of basic research, applied research, and product development), the SI approach allows for a great variety of participants in the innovative process. The organizations with which firms interact to gain, develop, and exchange various kinds

Changes in Economic Geography Theory and the Dynamics of Technological. . .

1315

of knowledge include other enterprises but also government bodies, research institutes, universities, banks, etc. By embedding innovation in its social environment, this approach puts culture and institutions at the core of the analysis: habits, norms, and laws shape the relations between the innovative agents. The literature has deployed the SI perspective in three major analytical perspectives: the sectoral, national, and regional levels. The sectoral systems of innovation (Malerba 2006) highlight sector-specific patterns of knowledge production and suggest that the relative importance of different types of knowledge spillovers and learning varies across sectors. At the national level, different institutional settings and governance structures shape the synergies between innovative agents and their evolutionary trajectory (Lundvall 1992). Combining the SI literature with concepts from economic geography that emphasize the local roots of innovation and learning (Storper 1997), economic geographers and regional economists have extended the SI perspective to the regional level (Edquist 1997). The regional systems of innovation (RSI) literature puts geography in the sense of institutional-relational places at the center of the analysis of spatial disparities in innovative performance. Iammarino (2005, p. 499) defines an RSI as “the localized network of actors and institutions in the public and private sectors whose activities and interactions generate, import, modify and diffuse new technologies with and outside the region.” From an RSI perspective, regionally specific modes of learning, technological trajectories, and knowledge bases constitute important reasons for regional disparities in innovative output. The consideration of both “economic” and “institutional-relational” places has profound implications for innovation policies that depart from the “one-size-fits-all” approach supported by the “linear model.” The design of any innovation policy should reflect region-specific modes of knowledge production and industrial specialization patterns, making the in-depth understanding of the technological trajectory and existing knowledge base of each region the starting point for any innovation policy (Uyarra and Flanagan 2010). The RSI’s emphasis on interactive learning implies that by simply increasing innovation inputs, policy-makers are unlikely to maximize a place’s innovative potential. The shift from individual actors to a systemic view calls for policymakers to address the institutionally shaped relations between the components of the system. The rationale for public intervention comes from some kind of systemic failure, which calls for corrective measures aimed at improving the local institutional setup of a place. In comparison with the clear-cut normative message of the linear model, just how policy-makers should translate the RSI approach into practice is less straightforward. The approach has been criticized because it provides little guidance on instruments and measures appropriate for tackling systemic failures. The approach’s interpretative flexibility renders its use more difficult for policy-makers. Equally, there are divergent views regarding the exact components and borders of an RSI. On the empirical side, a bias toward high-performing clusters has been also criticized. A further weakness of empirical RSI studies stems from the lack of indicators appropriate to truly measure the performance of a system in terms of the quality of

1316

R. Crescenzi

knowledge flows and interactive processes, rather than in terms of absolute innovative output.

5

Bringing Different Approaches Together (1): Non-spatial Proximities, “Integrated” Frameworks, and Local-Global Connectivity

As discussed in the previous sections, knowledge spillovers do not spread uniformly across space but exhibit strong distance-decay effects. While geographical proximity (geography as physical distance) facilitates the transmission of imperfectly appropriable but spatially sticky knowledge, the creation of new knowledge remains a socially embedded, interactive process. However, despite its potentially supportive role for the exchange of knowledge, geographical proximity constitutes “neither a necessary nor a sufficient condition” for learning processes (Boschma 2005, p.62). Learning processes and communication are shaped by industrial specialization, technological trajectory, and institutional modes of innovation that are characteristic of specific economic and/or relational places. Consequently, the analysis of the geography of innovative processes calls for the joint analysis of the full set of physical, economic, and institutional conditions that make innovation possible. Economic geography theory has responded to this challenge in two ways. On the one hand, it has explicitly conceptualized the differential (and potentially independent) role of spatial and non-spatial conditions and, on the other, has fully explored the full set of their interactions. In the first stream, Boschma (2005) has proposed a framework that introduces four non-spatial types of proximity, conceptually independent from physical distance: (i) cognitive proximity, referring to the degree to which agents share a common knowledge base; (ii) organizational proximity, defined as “the extent to which relations are shared in an organizational arrangement” (Boschma 2005, p.65); (iii) social proximity, measuring social embeddedness based on friendship, experience, and kinship of relations between agents; and (iv) institutional proximity, which is based on agents sharing the same institutional rules and cultural habits. In this framework cognitive proximity is considered as the only form of proximity that is a permanent prerequisite for interactive learning and innovation: without overlapping knowledge bases, learning is impossible – even if there is high geographical proximity between the agents. In this context co-location and physical proximity may still play an important role on a temporary basis to establish contacts that are then maintained through the continuous presence of organizational, social, or institutional proximity. The positive effect of geographical proximity (geography as distance in our framework) might be more indirect and subtle than frequently assumed: it may help innovative actors to find the “optimal” balance between different a-spatial forms of proximity shaping “economic” and “institutional” places conducive to innovation. Acknowledging that proximity can be defined independently of physical-metric considerations prepares the stage for an integrated view of the forces influencing

Changes in Economic Geography Theory and the Dynamics of Technological. . .

1317

regional innovation processes. The introduction of alternative proximities makes it possible to adopt a new perspective on the role of geography as distance. Non-spatial proximities provide the justification for knowledge flows in networks. Regions may thus use alternative proximities to overcome geographical distance and tap into remote knowledge pools via global pipelines. Although this relativizes the significance of co-location, it is important to emphasize that Boschma’s (2005) framework is nonetheless compatible with the concept of local buzz: we may conceive local buzz as “cognitive, organizational, social and institutional proximity brought together in a reduced geographical environment” (Crescenzi and RodríguezPose 2011, p.383). From this point of view, alternative proximities influence both inter-regional and intra-regional knowledge flows. With respect, instead, to economic places, the notion of cognitive proximity is particularly fruitful for analyses of opportunities of learning across industries. As stressed in the related variety perspective, cross-sectoral knowledge flows hinge upon the right level of cognitive proximity. As far as institutional-relational places are concerned, the idea that placespecific innovation systems display idiosyncratic modes of learning suggests that a lack of local institutional proximity may impede successful learning. The second stream of literature focused more directly on the interaction between geography as economic places, institutional-relational places, and physical-metrical distance – while simultaneously acknowledging the importance of alternative, non-spatial proximities. Following an “integrated approach,” any analysis of a region’s innovative performance has to take five keystones into account: (i) the link between local innovative efforts and knowledge generation as typically emphasized by a-spatial approaches, (ii) the geographical diffusion of knowledge spillovers and the region’s industrial specialization (representing geography as distance and geography as an economic place), (iii) the presence of networks based on alternative, non-spatial proximities, and (iv) the genesis and structure of local and regional policies as well as (v) the existence and efficiency of regional innovation systems, with the last two keystones reflecting geography as institutional-relational places (Crescenzi and Rodríguez-Pose 2011). The interaction of these five pillars shapes the creation of new knowledge in a region. In accordance with recent changes in economic geography theory, the importance of a-spatial networks and mobile capital with respect to global knowledge flows is underlined in this framework: the ability of local actors to establish external relations based on alternative proximities is assumed to determine the position of the region in global networks (e.g., where MNEs “pump” global knowledge into the local economy and “channel” the results of local innovative activities into global knowledge pipelines). This highly dynamic stream of literature, which explicitly aims at disentangling the innovative impact of various spatial and non-spatial factors, has not yet reached a consensus on the relative importance of different forms of proximity. The heterogeneity of the results is likely to stem from both methodological and operational differences. The estimation of knowledge production functions “augmented” in order to account for the impact of various proximities, although now customary in this literature, remains problematic due to the strong collinearity among the various proximities (and whose impact the foregoing functions set out to isolate and

1318

R. Crescenzi

compare) and the potential simultaneity between innovative performance and the evolution of non-spatial proximity relations. In addition, the use of patent data to measure both “proximities” and performance might generate additional measurement problems. Thus, in order to further advance our understanding of the transmission mechanisms underlying the geography of innovation, the KPF approach should be supplemented by other techniques able to directly model the formation of links and networks and their spatiality before assessing their impact on “aggregate” performance.

6

Bringing Different Approaches Together (2): Local-Global Connectivity

Frontier research has moved on to uncover the fundamental drivers of global connectivity and their impacts on national and regional economies (Crescenzi et al. 2014; Crescenzi and Iammarino 2017). Understanding these “tectonic forces” of the contemporary world economy calls for the conceptualization and operationalization of the heterogeneity of global flows of capital, knowledge, and skills bundled into investment flows. Highly diversified firms, countries, and regions are connected across the globe, responding to the challenges of growing internationalization with heterogeneous strategies resulting in evolving (and sometimes conflicting) corporate behavior and public policies. Conceptual and empirical work has explored the location strategies of multinational enterprises (MNEs) and their territorial impacts by investigating the diversity of actors involved and the heterogeneity of their linkages with different conceptual and empirical tools in order to shed new light on their implications for innovation, employment, and economic recovery after the Great Recession. Three key streams of research have emerged in this area of research: 1. A stream of research has explored the ways in which MNEs contribute to the formation of “global” pipelines/networks by exploring their geographical, institutional, and socioeconomic drivers as well as their directionality, nature, and intensity. 2. Other work has instead focused its attention on the national-, regional-, and firmlevel impacts generated by the linkages established by MNEs through global investment flow and has investigated how different geographical and institutional factors can shape these processes. 3. The analysis and discussion of public policies for both location and impacts is a cross-cutting theme that connects the two main streams of research mentioned above. The very limited number of contributions in this area aims to identify the key policy lessons to be learned from existing public policies targeting global connectivity and place them in the framework of the industrial and regional polices. A comprehensive – though still underdeveloped in the conceptualization and analysis of its regional innovation implication – approach to three elements of global

Changes in Economic Geography Theory and the Dynamics of Technological. . .

1319

connectivity outlined above comes from the literature on global value chains (GVCs) and global investment flows (GIFs) that focuses how they both can be leveraged for innovation at the sub-national regional level. Particularly its attention is on how regions can build, embed, and reshape GVCs to their local enhancement (Crescenzi et al. 2019 for a review). This GVC-inspired approach makes a threefold contribution to the innovation literature. First, it develops key definitions and conceptual foundations for the analysis of the link between GVCs, GIFs, and innovation at the sub-national and local level. Second, it offers new conceptualizations and critical insights on the regional drivers and impacts of global connectivity, bridging macrointernational- and micro-firm-level approaches. Third, it actively investigates what works (and what does not) when these concepts are leveraged to shape public policies with special reference to less developed regions. The wider lesson that the academic frontier literature on GVCs and innovation policy for regions shows is that connectivity is key. This is both in terms of firms building connectivity to the GVC in their home region and firms building connectivity to the GVC in a foreign region. Putting up walls and retreating into domestic markets will not make regions better off. However, being open, although an essential component, is not enough. The Washington Consensus view that countries or regions simply need to be macroeconomically sound and open to investment misses critical insights into locational specific heterogeneity (Ponte and Sturgeon 2014). Human and institutional capacity is critical to ensure competitiveness along the taskdriven service section of the value chain and requires governance input. However, there are also significant regional disparities within countries regarding GVC participation. In order to incorporate relevant benefits of globalization, regions need to develop a value capture strategy; they must first understand the differentiated preferences and strategies of MNEs. This is in terms of both sectors and GVC stages as well as desire for knowledge and entry mode. All these complexities result in very varied sub-national geographies of GVC connections that result in a highly diversified territorial innovation landscape on a global scale.

7

Conclusions

The conceptualization of geography in innovation literature has changed substantially since the heydays of the linear model. Persistent disparities in innovative performance across space have motivated researchers to develop more sophisticated analyses of the role of space and places in innovation processes. From the physicalmetrical approach of geography as distance to the emphasis on specialization and diversification patterns (geography as economic place) and institutional-relational factors, economic geography theory has substantially evolved in terms of its contribution to the understanding of technological dynamics. While the abandonment of the linear model has always been at the very center of the geographical analysis of innovation processes, geographical proximity has progressively lost its role as the single most important type of proximity to influence innovation processes. Cognitive proximity has emerged as a permanent requirement

1320

R. Crescenzi

for interactive learning while social, organizational, and institutional proximity may act as temporary substitutes for geographical proximity. Geographical proximity remains to strengthen non-spatial proximities and help innovative actors to find the right balance of non-spatial forms of proximity. The analysis of the systematic interactions among these different dimensions calls for progressively more “integrated frameworks” in order to understand territorial innovation dynamics and offers new frameworks of understanding for the factors that shape global connectivity and its local impacts through local spillovers and embeddedness. Local innovation is ultimately about strong internal socio-institutional systemic conditions supporting and embedding global connectivity through global value chains. These shifts in economic geography theory have important implications for innovation policies. The conceptualization of innovation as an interactive process occurring within complex innovation systems in highly interconnected global environments requires that policy-makers tackle linkages between actors on multiplicity of scales. As a consequence innovation policy starts from a profound understanding of a region’s idiosyncratic institutional setup, technological trajectory, knowledge base, and position of global flows of capital, skills, and knowledge. The identification of the potential barriers to innovative performance cannot be limited to the local dimension: understanding the region as a local-global interface whereby cooperation and networking are established with remote partners in other regions and countries. Policy measures aimed at stimulating knowledge flows should not merely concentrate on the local level but rather adopt a national and international perspective. Over the past decade, influential reports by the World Bank (World Bank 2009), the European Commission (Barca 2009), and the OECD (2009a, b) have in different ways reflected recent theoretical changes in economic geography. While the World Development Report 2009 has the important merit of fully incorporating geography as distance and economic places into the formulation of development policies, the policy conclusions formulated by the OECD and the Barca Report fully endorse an integrated territorial approach to innovation which takes full account of the role played by institutional-relational factors and non-spatial proximities. The development of the economic geography theory of innovation has contributed toward a progressive shift in the policy paradigm from a purely “science and technology” approach to the emphasis on agglomeration and spatial proximity that has characterized innovation policies targeting cluster development and firm incubators. The more recent evolution in the territorial theory of innovation has opened the way to more balanced integrated policies that systematically account for the multifaceted influence of geography on innovation processes. However, when it comes to global connectivity, the development of the literature has not been going hand in hand with real-world challenges and trends. The policy discourse has now given widespread recognition to the importance of global connectivity and global value chains (see the World Development Report 2020 for a review - World Bank 2019). In addition, growing concerns are being voiced by more critical observers regarding an ongoing process of global disintegration and “regionalization” of value chains with unknown developmental consequences for the economic peripheries. In these areas both conceptual and empirical contributions in economic geography are

Changes in Economic Geography Theory and the Dynamics of Technological. . .

1321

still lacking (Crescenzi et al. 2019), leaving us ill-prepared to understand the spatial inequality consequences of possibly major shifts (or reversal) in global integration processes. If the effectiveness of innovation policies has substantially benefited from the evolution of economic geography theory, a number of relevant aspects remain to be further explored both conceptually and empirically. From the conceptual point of view, further research is needed on the linkages between the microlevel of the individual innovative actors, the meso-level of their territorial interactions, and the diffusion channels of “macro” global flows of capital, skills, and knowledge. A sound theory for this complex set of processes is a necessary condition to open the “black box” of knowledge generation and diffusion. If the (increasing) importance of non-spatial proximities is now fully acknowledged, further work is needed on the reasons and the mechanisms that govern the development and the evolution of such proximities. In the same way as location theory aims to explain the co-location decisions on economic agents in physical space, it is necessary to explore the fundamental mechanisms that drive the development of non-spatial proximities between innovative agents in the cognitive space. Conversely, empirical analyses of the geography of innovation need to substantially broaden their scope both in terms of methodologies and use of available data in order to cope with increasing theoretical sophistication and new policy challenges (in both developed and emerging economies). Substantial progress is needed for a more detailed identification of the role of spatial and non-spatial networks in this context. In addition, the reliance on patent data has also led to the under-examination of non-patented forms of innovation including process and organizational innovation. The integrated use of different data sources (including firm-level innovation surveys such as the Community Innovation Survey) is certainly an important development in this direction, but the emergence of new and more sophisticated research questions calls for the collection of more sophisticated micro-data on the innovation and relational behavior of firms, individuals, and institutions. Economic geography literature offers the diagnostic tools for regional policymakers to identify existing and evolving competitive advantages and decide how to compete on existing strengths, or whether indeed to develop new ones. The next step is to offer evidence-based theory-driven guidance on how to work with current and future GVC actors and their potential integration. This will help regions understand how they in particular want to engage with GVCs. Approaches for regions on the technological frontier will differ from those in lagging regions. Finally, building and embedding a region into GVCs does not come without its potential drawbacks that developments in the literature should be able to identify and offer new tools to mitigate. Policies and programs require coordination, integration, and consistency for their success. Advancements in economic geography should offer policy-makers support to reshape their region and its interaction with GVCs. For regions it is “not only a matter of whether to participate in the global economy, but how to do so gainfully” (Gereffi and Fernandez-Stark et al. 2011, Pg.6), and economic geography needs to shed light on what works in practice in order to achieve this.

1322

8

R. Crescenzi

Cross-References

▶ Agglomeration and Jobs ▶ Cities, Knowledge, and Innovation ▶ Clusters, Local Districts, and Innovative Milieux ▶ Endogenous Growth Theory and Regional Extensions ▶ Generation and Diffusion of Innovation ▶ Incorporating Space in the Theory of Endogenous Growth: Contributions from the New Economic Geography ▶ Knowledge Flows, Knowledge Externalities, and Regional Economic Development ▶ Knowledge Spillovers Informed by Network Theory and Social Network Analysis ▶ Neoclassical Regional Growth Models ▶ Networks in the Innovation Process ▶ New Economic Geography After 30 Years ▶ Relational and Evolutionary Economic Geography ▶ Social Capital, Innovativeness, and the New Economic Geography ▶ Systems of Innovation and the Learning Region ▶ The Geography of Innovation ▶ The Geography of R&D Collaboration Networks Acknowledgments The author would like to thank the Editors of the Handbook for their encouragement and support with the revision of this chapter. This research has received funding from the European Research Council under the European Union’s Horizon 2020 Programme H2020/20142020 (Grant Agreement n 639633-MASSIVE-ERC-2014-STG). The author remains solely responsible for any errors contained in the chapter.

References Amin A (2003) Industrial districts. In: Shepard E, Barnes TJ (eds) A companion to economic geography. Blackwell, Oxford, pp 149–168 Asheim BT, Isaksen A, Trippl M (2019) Advanced introduction to regional innovation systems. Edward Elgar, Cheltenham Athelt H, Cohendet P (2014) The creation of knowledge: local building, global accessing and economic development – toward an agenda. J Econ Geogr 14:869–882. https://doi.org/10.1093/ jeg/lbu027. http://hdl.handle.net/1807/71468 Barca F (2009) An agenda for a reformed cohesion policy. European Commission, Brussels Beaudry C, Schiffauerova A (2009) Who’s right, Marshall or Jacobs? The localization versus urbanization debate. Res Policy 38(2):318–337 Boschma RA (2005) Proximity and innovation: a critical assessment. Reg Stud 39(1):61–74 Breschi S (2011) The geography of knowledge flows. In: Cooke P, Asheim BT, Boschma R, Martin R, Schwartz D, Toedtling F (eds) Handbook of regional innovation and growth. Edward Elgar, Cheltenham, pp 132–142 Cantwell JA (2009) Location and the multinational enterprise. J Int Bus Stud 40(1):35–41 Content J, Frenken K (2016) Related variety and economic development: a literature review. Eur Plan Stud 24(12):2097–2112. https://doi.org/10.1080/09654313.2016.1246517

Changes in Economic Geography Theory and the Dynamics of Technological. . .

1323

Crescenzi R (2014) The evolving dialogue between innovation and economic geography. From physical distance to non-spatial proximities and ‘integrated’ frameworks Papers in Evolutionary Economic Geography (PEEG) 1408, Utrecht University, Section of Economic Geography, 2014 Crescenzi R, Gagliardi L (2018) The innovative performance of firms in heterogeneous environments: the interplay between external knowledge and internal absorptive capacities. Res Policy 47(4):782–795. https://doi.org/10.1016/j.respol.2018.02.006 Crescenzi R, Iammarino S (2017) Global investments and regional development trajectories: the missing links. Reg Stud 51(1):97–115. https://doi.org/10.1080/00343404.2016.1262016. [Open Access] Crescenzi R, Rodríguez-Pose A (2011) Innovation and Regional Growth in the European Union. Berlin, Heidelberg and New York: Springer, [ISBN 978-3-642-17760-6] Crescenzi R, Pietrobelli C, Rabellotti R (2014) Innovation drivers, value chains and the geography of multinational corporations in Europe. J Econ Geogr 14(6):1053–1086. https://doi.org/ 10.1093/jeg/lbt018 Crescenzi R, Harman O, Arnold D (2019) Move On Up! Building, embedding and reshaping global value chains through investment flows: insights for regional innovation policies. Background paper for an OECD/EC Workshop on 21 September 2018 within the workshop series “Broadening innovation policy: new insights for regions and cities”, Paris. Accessible: https://www. oecd.org/cfe/regional-policy/CrescenziHarman(2018)MoveOnUp.pdf De Groot HLF, Poot J, Smit MJ (2009) Agglomeration externalities, innovation and regional growth: theoretical perspectives and meta-analysis. In: Capello R, Nijkamp P (eds) Handbook of regional growth and development theories. Edward Elgar, Northampton, pp 256–281 Döring T, Schnellenbach J (2006) What do we know about geographical knowledge spillovers and regional growth?: a survey of the literature. Reg Stud 40(3):375–395 Duranton G, Puga D (2014) The growth of cities. In: Aghion P, Durlauf S (eds) Handbook of economic growth, vol II. Elsevier, Amsterdam, pp 781–853 Edquist C (1997) Introduction. In: Edquist C (ed) Systems of innovation: technologies, institutions, and organizations. Pinter, London, pp 1–35 Fernandez-Stark K, Bamber P, Gereffi G (2011) The offshore services value chain: Upgrading trajectories in developing countries. International Journal of Technological Learning, Innovation and Development 4(1–3):206–234 Godin B (2006) The history of the linear model of innovation: the historical construction of an analytical framework. Sci Technol Hum Values 31(6):639–667 Iammarino S (2005) An evolutionary integrated view of regional systems of innovation: concepts, measures and historical perspectives. Eur Plan Stud 13(4):497–519 Iammarino S (2011) Regional innovation and diversity. In: Cooke P, Asheim BT, Boschma R, Martin R, Schwartz D, Toedtling F (eds) Handbook of regional innovation and growth. Edward Elgar, Cheltenham, pp 143–154 Kline SJ, Rosenberg N (1986) An overview on innovation. In: Landau R, Rosenberg N (eds) The positive sum strategy. National Academy Press, Washington, DC Lundvall B-A (ed) (1992) National systems of innovation: towards a theory of innovation and interactive learning. Pinter, London Malerba F (2006) Innovation and the evolution of industries. J Evol Econ 16(1):3–23 OECD (2009a) How regions grow. OECD, Paris OECD (2009b) Regions matter: economic recovery, innovation and sustainable growth. OECD, Paris Ponte S, Sturgeon T (2014) Explaining governance in global value chains: A modular theorybuilding effort. Rev Int Polit Econ 21(1):195–223 Saxenian A (2006) The new Argonauts: regional advantage in a global economy. Harvard University Press, Cambridge, MA Storper M (1997) The regional world: territorial development in a global economy. Guilford Press, New York

1324

R. Crescenzi

Storper M, Venables AJ (2004) Buzz: face-to-face contact and the urban economy. J Econ Geogr 4 (4):351–370 Uyarra E, Flanagan K (2010) From regional systems of innovation to regions as innovation policy spaces. Environ Plan C Gov Policy 28(4):681–695 World Bank (2009) World development report reshaping economic geography. World Bank, Washington, DC World Bank (2019) World development report 2020: trading for development in the age of global value chains. World Bank, Washington, DC

Geographical Economics and Policy Henry G. Overman

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Empirical Analysis: Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Empirical Analysis: Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Policy Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1326 1328 1330 1333 1338 1339 1339

Abstract

This chapter is concerned with the process by which geographical economics influences policy. It considers several barriers that limit this influence focusing specifically on the availability of data, the limitations of spatial analysis, and the role of the evaluation of government policy. It considers why these problems present such significant barriers and proposes some solutions. In terms of the availability of data, the chapter explains why problems concerning the correct unit of analysis and measurement error may be particularly acute for spatial data (especially at smaller spatial scales). Resulting concerns about the representativeness of data and the mismatch between functional and administrative units may further hamper interaction with policy makers. For spatial analysis, the major problem concerns the extent to which empirical work identifies the causal factors driving spatial economic phenomena. It is suggested that greater focus on evaluating the impact of policies may provide one solution to this general identification problem.

H. G. Overman (*) Department of Geography and Environment, London School of Economics and Political Science, London, UK e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_93

1325

1326

H. G. Overman

Keywords

Policy maker · Geographical information system · Causal effect · Policy evaluation · Geographical economics

1

Introduction

In most countries, economic prosperity is very unevenly distributed across space. Regions, cities, and neighborhoods seem to be very unequal. This is true if we look at average earnings, employment, education, and almost any other socioeconomic outcome. Regional policy, urban policy, and even neighborhood policy are all largely based on concerns about these kinds of disparities, and tackling these persistent disparities is a key policy objective in many countries. Providing a rigorous understanding of the nature, extent, causes, and consequences of these disparities has been a key motivation behind the development of geographical economics (broadly defined). This chapter focuses on the policy response to these disparities and specifically on the interaction between academic research and spatial economic policy. In the limited space available, it is clearly not possible to summarize all the available research that has relevance to policy makers concerned with spatial economic policy. Instead, this chapter considers the process by which research informs policy, focusing specifically on the role of empirical analysis. In doing so, the chapter considers criticisms of existing empirical work, introduces means of evaluating the impact of spatial policies, discusses the major barriers to interaction, and makes some suggestions on how these might be addressed in future research. The last two of these issues have received some consideration by, e.g., Martin and Sunley (2011) from the perspective of economic geography “proper.” In contrast, this chapter is specifically concerned with geographical economics (i.e., the research field that has evolved at the interface between economics and geography and which this chapter treats as synonymous with spatial economics). However, it is clear that many of the issues apply more generally in terms of the impact of research on policy making. The chapter focuses specifically on the role of empirical analysis and policy evaluation in informing policy. The strong theoretical bias of the new economic geography means that many of the issues concerning the application of theory to policy have received fairly detailed consideration in the literature (see, e.g., Baldwin et al. 2003). Combes et al. (2005) provides a diagrammatic framework which carefully outlines many of the central issues. This literature reaches two broad conclusions. First, from a positive perspective (i.e., what will be the impact of a specific policy), the theoretical literature is better placed to provide general guiding principles rather than detailed answers. Second, our theoretical understanding of what policy should do (i.e., normative analysis) is far less developed and, as usual, depends crucially on assumptions about the relevant objective function. In short, from an academic perspective, while theoretical analysis is not always sufficiently

Geographical Economics and Policy

1327

well developed to be useful in guiding policy, the problems are at least well understood. From a policy makers’ perspective, these theoretical issues are arguably second order. Instead, the fundamental concern is whether stylized formal modeling can ever provide “real-world” insights. Of course, these concerns are not unique to policy audiences nor to spatial policy. However, assuming some policy makers do not hold such reservations on the validity of formal modeling (or, at least, are willing to set them aside), the central issue from a policy perspective becomes the provision of empirical evidence about the applicability of the underlying theory. This issue and the barriers faced in providing such evidence in the specific context of policy formation have received far less attention in the literature. It is for this reason that this chapter focuses on the role of the empirical analysis of spatial data in policy making. Of course, questions concerning the empirical analysis of spatial data go beyond the role this might play in assessing the validity of formal modeling. Indeed, for most policy makers, this would be a distinctly second-order concern. Instead, experience suggests that policy makers look to empirical analysis to do (at least) three things: describe the problem, assess the underlying causes, and evaluate the alternative policy responses. In fact, these three roles are not so far removed from how many geographical economists would prefer to structure empirical research. In an ideal world, theoretical modeling would deliver predictions about the underlying causes of spatial disparities. Appropriate data would then be used to describe these disparities and to test the validity of predictions from the theory. If the data support the predictions from the theoretical model, one could then use the model to think through the impact of alternative policy responses that have not yet been implemented. More recently, it has also been recognized that this logic can be reversed, with theory providing predictions about the impact of policy and empirical evaluation of policy that has been implemented then used to test the underlying theory. That is, assessing the causal impact of existing policies may be useful in increasing our theoretical understanding of how the spatial economy works, what causes spatial disparities, and what, if anything, policy might do to address these disparities. In addition, there is considerable interest in establishing the causal impact of existing policy independent of what it can tell us about theory. Geographical economics faces several barriers in addressing each of these questions. Data availability can hamper the provision of basic descriptive statistics, as well as further empirical analysis. Spatial research (by academics) often fails to pay enough attention to the central question of identifying the causal mechanisms at work. Finally, and related, much policy evaluation (by governments and consultants) fails to identify the causal impact of policies, often despite claims to the contrary. On all three dimensions, and particularly with respect to policy evaluation, this chapter will argue that the empirical literature analyzing spatial data often falls someway short of the standards set by other fields of economics. This partly reflects the inherent difficulty of spatial analysis but also stems from a failure by some researchers to adopt methodological developments that might improve analysis. The empirical literature is only beginning to address this shortcoming.

1328

H. G. Overman

The rest of this chapter is structured as follows. The next section focuses on the availability of suitable data, the starting point for better empirical analysis. The two subsequent sections deal with questions of causality and the policy evaluation of spatial policies, while a final section briefly concludes.

2

Empirical Analysis: Data

Problems of data availability depend on the policy context, but the most common problem tends to arise from the lack of available data at the appropriate spatial scale. Even at large spatial scales, these problems can be acute. For example, in the UK, even basic statistics to describe spatial disparities across cities are not easily available. In other countries, while data for larger spatial units is readily available (e.g., for US metropolitan areas), more detailed data (e.g., on firm location) may not be available or may be subject to quite restrictive access arrangements. From a policy perspective, the lack of spatial data that can be used to generate statistics to describe the problem at hand represents a major barrier to the use of geographical economics in policy making. This barrier is arguably greater for geographical economics than for some other disciplines similarly concerned with spatial disparities but which rely less on formal modeling and quantitative empirical analysis. Even when data is available, however, there remains the fundamental problem that for most issues in geographical economics, the correct unit of analysis is difficult to define. For example, researchers in the field of international economics can often use nation-state boundaries to define the appropriate unit of analysis because these boundaries generate significant barriers to factor mobility (and differences in factor availability underpin many theories of international trade). For the spatial researcher, in contrast, city or regional boundaries are often no more than administrative creations. For some problems, where administrative units form the appropriate unit of observation, such data might be sufficient. This may partially explain, for example, why local tax competition and public good provision have been so extensively studied in the empirical literature. For other outcomes, such as economic growth, administrative units may provide a very poor substitute for properly defined functional economic areas (see Cheshire and Magrini (2009) for further discussion). Problems concerning the definition of a suitable unit of analysis and the availability of data for these units often become more serious at smaller spatial scales. For example, the appropriate definition of a “neighborhood” is a major concern for researchers interested in identifying the importance of neighborhood effects (i.e., whether neighborhood composition affects individual outcomes over and above any effect of individual characteristics). As was the case for larger spatial units, these definitional problems may represent a more serious disadvantage for geographical economics than for disciplines that adopt qualitative approaches to consider the existence of neighborhood or peer effects. An ethnographic study, for example, can easily accommodate self-defined notions of neighborhoods. In contrast, (spatial) econometric analysis requires neighborhoods to be formally defined so that data can be collected that characterizes the structure of the neighborhood. Nor is this

Geographical Economics and Policy

1329

definitional problem the only, or even the most significant, barrier to econometric analysis in this area (and many related ones concerned with feedback between units of analysis in the outcome of interest). We return to this issue below. Even when data is available for something approximating the correct unit of analysis for the question at hand, there may be considerable measurement error present. Of course, measurement error is usually present in nonspatial data, but the problem is more pronounced for spatial data because all the standard problems occur (e.g., is employment correctly defined, measured, and recorded), but there is an additional spatial “allocation” problem. This spatial allocation problem arises because the construction of spatial data for any specific unit of analysis requires dots on a map to be allocated to units in a box (see Duranton and Overman (2005) for further discussion). Inaccuracies can occur both in terms of the geographical location of the “dot” (e.g., the spatial coordinates of zip codes) and the boundaries of the box. Uncertainty over the correct definition of the unit of analysis, as discussed above, exacerbates these difficulties. These measurement problems become more profound as the spatial resolution of the data increases because any absolute measurement error translates in to greater relative error at smaller scales. Even if points are accurately allocated to geographical units, researchers using sampled data face an additional problem at small scales: For a given sampling frame, smaller spatial scales reduce the average sample size in any given geographical unit. As discussed in Duranton and Overman (2005), these problems can sometimes be avoided by working in continuous space. Whether this is a solution depends, however, on the problem at hand. When the issue is one of individual behavior (e.g., whether individual labor market outcomes are affected by employment accessibility), it may make sense to work in continuous space using geo-coded individual data. In contrast, when the interest is in broad spatial patterns (e.g., what causes differences in city growth), analysis in continuous space, based on individual geo-coded data, may not be helpful. Regardless of whether switching to continuous space may help with analysis, it often will not solve the problems that poor data create in terms of generating descriptive statistics for spatial units of observation. Experience suggests that a lack of suitable data and the resulting inadequacy of descriptive statistics represent a significant barrier in using geographical economics to inform spatial policy making. As argued in Overman (2010), the increased use of geographical information systems (GIS) is slowly helping to solve many of these problems of data availability. GIS are helping reduce measurement error as well as making more data available by facilitating the reconciliation of data for different non-nested spatial units. The increased availability of geo-referenced data also allows researchers increasingly to avoid the need for arbitrary discretizations of data (because they allow the researcher to construct data for appropriate spatial units). Finally, new types of data are helping increase our understanding of spatial economic phenomena. Interestingly, however, even if data availability becomes less of an issue in terms of analysis, a lack of descriptive statistics for specific administrative units may continue to cause a problem in terms of the interaction with policy makers. A major part of the problem stems from the fact that while these administrative units

1330

H. G. Overman

may be arbitrary from an analytical perspective, they are hugely important from a policy makers’ perspective. Policy makers want to know how these administrative units are performing partly as an input into decision making but also because their performance is often assessed by comparison to other similar units. As a result, even when such data might not be a particularly useful guide for relative performance (if, say, a labor market boundary spreads beyond the boundary), it will still be of great interest to policy makers. The problem is further compounded when it comes to the empirical analysis of the underlying causes of spatial disparities. Specifically, in the absence of descriptive statistics based on representative data for particular places, many policy makers think that no progress is possible. This problem appears to arise because many policy makers struggle with the idea that sampled data can be informative about spatial processes unless the data is representative (which they equate with the production of “accurate” descriptive statistics for specific places). Of course, this problem of representativeness is not unique to spatial settings, but experience suggests that it seems to be particularly important in terms of policy makers’ concerns about empirical analysis in this area. One possible underlying source of the problem may arise from the belief that each location is somehow unique either in terms of its characteristics or its responsiveness to policy initiatives (or both). However, as is increasingly recognized in applied microeconomics, such heterogeneity (including in the response to changes) may require care to be taken in interpreting statistical estimates but does not invalidate regression analysis of the problem at hand (see, e.g., the discussion of local average treatment effects in Angrist and Pischke 2009). Given that these insights are often poorly understood by many researchers, it is no surprise that they have not had much influence to date on policy makers. It is not so clear why the criticism seems to have such bite in terms of the empirical analysis of spatial data. It would be interesting to know, for example, whether the problem is particularly acute because of the tendency of other disciplines’ strong emphasis on the uniqueness of location as an argument against quantitative approaches to spatial problems. Regardless of the underlying reason, the lack of large samples for administrative spatial units remains a barrier to informing policy making at all spatial scales even though it need not be.

3

Empirical Analysis: Causality

As discussed above, even when appropriate spatial data is available, the second broad set of problems concerns the type of empirical analysis that has been traditionally undertaken using such data. These problems are discussed in detail in Gibbons et al. (2015) who argue that the biggest problem stems from the fact that traditional spatial econometric and statistical analysis has not paid sufficient consideration to the crucial issue of identification. This has profound implications for our ability to understand the causes of spatial disparities and for empirical analysis to influence the development of policy. To understand why, one needs to consider the

Geographical Economics and Policy

1331

way in which the empirical analysis of observed data might allow us to understand causality. In many fields of economics, empirical research is increasingly concerned with questions about causality (Angrist and Pischke 2009). That is, questions of the type “if we change x what do we expect to happen to y.” This is particularly the case in fields focused on individual (microeconomic) rather than aggregate (macroeconomic) behavior. Although geographical economics is clearly concerned with both levels of analysis, if it is to be useful in policy making, then these types of questions must take center stage. After all, policy usually seeks to change some x in order, hopefully, to achieve some desired change in outcome y. Even when policy can directly influence the outcome of interest, we still need to understand how economic agents adjust to any change so that we can establish what will happen after this adjustment has taken place. The fundamental challenge to answering these questions for (most) economic data is that the determinants (x) are not randomly assigned. This is certainly the case for many policy interventions, when x (e.g., investment in the transport network) will often be specifically set to (partially) reflect differences in the outcome of interest y (e.g., the level of GDP). As a result, in real-world data, we jointly observe x and y, so we lack the counterfactual. That is, what would have happened if x had been set at some different level? This is a problem because it is the comparison of actual outcomes to this counterfactual that identifies the causal impact of determinant x on outcome y. Fortunately, applied economics has come a long way in its efforts to find credible and creative ways to answer such questions by constructing counterfactuals from observational data. Unfortunately, however, such methods have not been widely applied in much applied analysis of spatial data, particularly in analysis undertaken using the standard spatial econometrics toolbox. Instead, much applied spatial econometrics research assumes that we know the way in which spatial interactions occur, writes down the corresponding spatial econometric model, and estimates the parameters by nonlinear methods such as (quasi-)maximum likelihood. Questions of identification (i.e., does an estimated correlation imply that some determinant x causes outcome y?) have generally been addressed by asking which spatial processes best fit the data. While this sounds straightforward, Gibbons et al. (2015) explain that it is very hard to distinguish between alternative specifications that have very different implications for which causal relationships are at work. This fundamental identification problem, and the lack of attention given to it, significantly reduces the usefulness of this kind of spatial econometric analysis for policy making. In practice, analytical capacity constraints limit the extent to which many government departments can engage with quantitative (econometric) analysis. Given this more general problem, it is perhaps no surprise that spatial econometric model specifications and estimation are sufficiently complex that research in this tradition has often proved very hard to communicate to policy makers. Coupled with concerns about the underlying secondary data, this can often lead policy makers to prioritize research which focuses on carefully describing the nature of spatial disparities rather than properly identifying the underlying causes. As discussed above, this tendency is

1332

H. G. Overman

reinforced by political interest in outcomes for specific administrative units. Again, it would be interesting to understand why this tendency to conflate description with analysis is so pronounced in the area of spatial disparities and policy making. If data availability and the type of analysis undertaken with available data represent significant barriers to informed policy making, a further barrier arises because, even when these problems are recognized, they can be very hard to address. The fundamental reason for this is that for many spatial economic phenomena of interest, suitable identification strategies can be hard to develop. Researchers are making progress in this area, but, as argued in Gibbons et al. (2015), progress is likely to remain slow unless issues of identification are given far more precedence in spatial empirical analysis. They suggest that these issues need to be put at center stage and discuss strategies for dealing with identification in spatial settings. One possibility is to use explicit sources of randomization that occur as a result of institutional rules and processes. For example, the random allocation of dorm-mates has been used by Sacerdote (2001) and others to study peer effects in college grades. Randomization often does not solve all identification problems, but it does reduce problems arising from the self-selection of individuals into groups. However, when it comes to areas of substantive policy interest, there are several barriers to randomization, especially in terms of exposure to policy interventions. The major one is arguably political. While many academics are comfortable with randomization (e.g., because they are willing to start with the assumption that a policy will have no effect), this is a far harder proposition for policy makers. For example, if a policy maker starts with the assumption that policy will be beneficial, then randomization generates ethical concerns that those most in need might not be treated. These ethical, and other practical, issues have been extensively discussed by many of the so-called randomistas who advocate the use of random trails in the development context. While they are clearly making some mileage in specific circumstances, the general arguments are not yet won, even in circumstances in which randomization would be exceptionally helpful. In addition to this central problem, large-scale field experiments such as the Moving to Opportunity program are rare and costly (and still suffer from very difficult to avoid design flaws). On the other hand, small-scale experiments suffer from concerns about external validity (i.e., the extent to which the results would generalize to other contexts). Such concerns about external validity, although not expressed in this way, may have bite for policy makers in the area of spatial policy who, as discussed above, often think of every place as being somehow unique. For all these reasons, it is hard to imagine policy makers agreeing to experiments to answer many spatial questions (even if such experiments could be designed), and it is therefore unlikely that randomization will represent a way forward for many areas of interest. In the absence of suitably randomized data, appropriate instrumental variable strategies may represent an alternative way of circumventing the reverse causality problems that bedevil much spatial analysis (particularly in areas important to policy making). That said, it is often hard to think of suitable instrumental variables in situations where we are interested in “area effects” that arise because of feedback to

Geographical Economics and Policy

1333

the outcome of interest from the outcomes of other economic agents located nearby. As has been understood for some time, econometric analysis of such “endogenous social effects” faces severe identification challenges (see Gibbons et al. 2015). This can make it very difficult to assess whether such effects are occurring or whether the appearance of interaction arises because of underlying similarities between nearby units of observation. It is interesting to note that empirical analysis looking at neighborhood effects (e.g., the impact of neighborhood on schooling) tends to find little evidence of strong effects when it carefully addresses these identification issues. This is in direct contrast to more qualitative approaches. This may partly explain why geographical economics has had relatively little influence on policy aimed at neighborhood “mixing” and other initiatives to “mitigate” neighborhood effects. It may be that the best we can do in these circumstances is to make policy makers realize the difficulties inherent in distinguishing these alternatives and point out that in research to date, the more careful the identification strategy, the less evidence there is of interaction through endogenous social effects. In other spatial settings, however, where interest is not limited to endogenous social effects, it is increasingly possible to develop effective instrumental variable strategies as a result of policy designs, institutional rules, and natural environmental features (or even better, changes in these factors). Overman (2010) and Gibbons et al. (2015) provide many concrete examples. One significant problem remains; however, these strategies can be very hard to explain to policy makers who do not fully understand the need for careful identification strategies (at least for those with little or no economics training). One possible way to interest policy makers in these issues is through the use of identification strategies based on the details of existing policy interventions. Policy makers often need to evaluate the impact of policy. In addition, specific policy features may also help with the identification of causal factors at work in spatial processes. It would appear, then, that there may be two linked arguments for focusing greater efforts on credible policy evaluation. First, one would hope that effective policy evaluation should be a key input into policy development. Second, such policy evaluations may provide useful identification strategies to increase our understanding of the way the spatial economy works. Because of the possibilities this presents, it is worth considering these issues in some detail, and it is to this that we now turn.

4

Policy Evaluation

Policy specific outputs (e.g., the number of workers trained or firms assisted) are increasingly well monitored by governments. In contrast, many formal (i.e., government sponsored) evaluations of policies that seek to look at outcomes do not use credible identification strategies to assess the causal impact of policy interventions. As for spatial empirical analysis more generally, it could be argued that this problem appears to be particularly acute for spatially targeted policies. Once again, these problems partly stem from the difficulty in coming up with identification strategies in

1334

H. G. Overman

spatial settings. That is, in assessing, what would have happened to the unit of analysis (the area, firm, worker, etc.) in the absence of the policy intervention? As emphasized by the literature on program treatment effects (see, e.g., DiNardo and Lee 2010), solving this problem requires the construction of a valid counterfactual that can then be compared to observed outcomes. In this section, we argue that, despite the difficulties, such an approach can be applied to many spatial policies and that the resulting evaluation can be informative about both the effect of policy and the spatial economic processes at work. Some concrete examples, mostly drawn from the USA and UK, will be used to help clarify the issues. Let us start with the example of Enterprise Zones (also known as Empowerment Zones in the USA and referred to below as EZs). These spatially targeted policies aim to improve economic outcomes (e.g., employment and number of businesses) in deprived areas. To identify their causal effect, we need to figure out what would have happened in these areas in the absence of intervention. One possible identification strategy is to compare these areas to other similar areas that were not targeted by the policy. For many government-funded reports, even this simple strategy would substantially improve the quality of evaluations. From an academic perspective, however, such simple comparisons remain problematic because they require very strong identifying assumptions. Specifically, unless we have an exhaustive list of area characteristics that might influence local economic outcomes, we might worry that some unobserved characteristic of areas drives both the decision to target the area and outcomes in that area. In this case, we might wrongly attribute any change in outcomes to the policy when, in fact, it is driven by unobservable area characteristics. Much of the recent improvement in the evaluation of program treatment effects has come from novel ways of addressing this problem combined with a refined understanding of how to interpret the resulting estimates. One possibility is to compare outcomes for those areas that receive funding to those areas that applied for, but did not receive, funding. This strategy has been used by Busso et al. (2013) in their evaluation of the US Empowerment Zone policy. Such a strategy can be highly effective in removing the influence of many unobservables that might bias estimates of policy impact, especially if restrictions to funding limit the number of areas treated so that selection among the applicants is less likely to be driven by these unobservable characteristics. Comparing outcomes for the two groups will then tell us whether those that won the competition actually do better, and we may be willing to attribute this to the impact of the policy. Analysis could also compare those that entered the competition to areas that appear to be similar but that did not enter the competition (to see whether those that entered the competition somehow differ from those that do not). For these kind of strategies to achieve identification of the causal effect of the policy requires that, conditional on observable characteristics of areas, treatment is not correlated with any unobservable characteristic that directly influences the outcome of interest. The timing of policy interventions may provide another possible source of identification. For example, EZs given money early on should start improving before those given money later. If they do not, that raises questions about whether treatment caused any improvement (or decline) or instead whether this was caused by some

Geographical Economics and Policy

1335

other factor (such as changes in the macroeconomy). These strategies have received recent application in the literature including the work by Busso et al. (2013) on US Empowerment Zones. For this strategy, identification of the causal effect of the policy requires that, conditional on observable characteristics of areas, timing of treatment is not correlated with any unobservable characteristic that directly influences the outcome of interest. Even in situations where the researcher cannot be sure that decisions to fund (or the timing of funding) are uncorrelated with all unobservable characteristics that directly influence the outcome of interest, we may believe that this condition holds for marginal decisions. Imagine, for example, that the government makes its funding decisions based on a ranking of projects from best to worst. Such detailed assessment of projects often occurs after a rougher process has ruled out the weakest projects (so the sample of projects subject to the more detailed ranking may be those that make it through this first screening process). If a researcher has access to the ranking of projects, then this would allow the comparison of outcomes for otherwise similar areas that were just “above the bar” (and so got treated) to outcomes for areas just “below the bar” (who did not get treated). Sometimes, the criteria for treatment will be based on some observable characteristic of areas rather than some ranking based on the quality of bids submitted to the program under consideration. Then areas that just satisfy the criteria and so get treated can be compared to areas that just fail to satisfy the criteria and so do not get treated. Some policies, such as the UK’s Local Enterprise Growth Initiative may use a combination of cutoff criteria and competition to decide who gets treated (from among those that are eligible). Applications of such regression discontinuity designs to spatial economic policies include Baum-Snow and Marion (2009) and Dachis et al. (2011). As discussed further in Lee and Lemieux (2010), these discontinuity designs can be used to identify the causal effect of policy, providing that applicants do not have full control over the characteristics that determine treatment. Notice that this is a weaker requirement than having no control (so that treatment need not be completely random across all areas) but comes at some cost in terms of the extent to which estimated effects generalize to areas that are further away from the policy cutoff. This is sometimes characterized as involving a trade-off between internal and external validity (i.e., the researcher gets good estimates of the causal effect for areas around the threshold, but it is not clear whether these would generalize to areas away from the threshold). This distinction provides one example of how the recent program treatment literature has clarified our understanding of how to interpret estimated parameters as well as how to estimate them in the first place. So far there is nothing specifically spatial about these identification strategies (other than the fact that the policy intervention occurs in specific places) which have been more widely used in other applied microeconomic literatures (particularly in development, education, and labor). However, the fact that the policy intervention occurs in specific places and that these places have a geographical location provides a further source of discontinuity which may be useful in achieving identification. Specifically, we can use “spatial differencing” to compare treated areas to nearby non-treated areas. If unobservable characteristics vary smoothly over space, then

1336

H. G. Overman

such a comparison may help control for unobservable characteristics that affect both treatment and outcomes. As with regular (nonspatial) discontinuity designs, the validity and interpretation of the resulting parameter estimates depend crucially on how the borders of treated areas are determined and what happens to the unobservable characteristics of areas at those borders. If unobservable characteristics vary continuously at the border, then spatial differencing may give us the causal effect of the policy even if policy assignment is nonrandom (providing that policy makers do not have perfect control over the location of the boundary that determines the policy area). Even if unobservable characteristics do not vary continuously at the border, spatial differencing may still help if it eliminates larger spatial trends, making it easier to find suitable instruments for the spatially differenced variables (see Duranton et al. (2011) for further details and an application to the impact of local tax rates on employment). A further complication arises when using spatial differencing if treatment effects spill over geographical boundaries to impact non-treated areas. This spillover might be positive (often referred to as a multiplier effect) or negative (often referred to as displacement). Regardless of the sign of the effect, if the interest is in the overall aggregate impact of the policy for an area that extends beyond the boundary of the treated zone (as it might be, e.g., for Enterprise Zones) such spillovers significantly complicate interpretation of estimated coefficients. Specifically, in the presence of positive spillovers, estimates of the effect of policy are biased downward and vice versa for negative spillovers. These issues are discussed further in Neumark and Simpson (2015), but the literature is only just beginning to grapple with the resulting complications. Official evaluations of government policies (i.e., those paid for and sponsored by government) usually make little, if any, use of these program features to help identify the causal impact of policy. For a geographical economist, this significantly complicates the interaction with policy makers, because reports that are less careful about causality are often willing to make much broader claims about the impact of a policy (and how that impact was achieved). As a result, policy makers face a difficult tradeoff when trying to decide how to evaluate policies. Wide-ranging “evaluations” that are less careful about causality appear to provide more information as an input in to the policy-making process. Taken at face value, such evaluations allow policy makers to both assess value for money and make changes to policy, while appearing to take into account evidence about the impact of the policy. In contrast, empirical research in the program treatment effects tradition often makes fairly narrow claims about whether the policy has a causal impact (and then, sometimes, only for a particular part of the population depending on the methods used). Of course, there are several arguments in favor of an approach which focuses on a narrower range of issues concerning the causal effects of policy. First, and most important, a policy evaluation that focuses on causal effects should substantially improve our understanding of whether policies such as Enterprise Zones have any net impact (including possibly whether or not they generate or mainly displace economic activity). This would help future governments when they decide whether to maintain or reintroduce such a scheme. In addition to this core reason, it is also

Geographical Economics and Policy

1337

interesting to note that in many circumstances, government could get this type of policy analysis at little cost because this kind of evaluation has the potential to be published in top academic journals (cf. a number of the references provided above). Such “open evaluation” will not work for all policies (because the degree of academic interest will usually depend on the extent to which the policy “design” allows causal effects to be identified), but it could work for a good proportion of them. In short, when appropriate, policy evaluation of this kind does not need to be big, expensive, and centralized. Instead, it can be outsourced by using open evaluation in the academic (and wider nongovernmental) community. A major barrier to such an approach to evaluation is, once again, the availability of data. But now the issue concerns the availability of information on the government policy to be evaluated. A first step in moving toward a more open evaluation model would require good information to be recorded at all stages of the policy-making process – for example, whether selection of projects is competitive, how decisions are made, what is the location and timing of intended and actual expenditure, and what types of expenditure (buildings, capital grants, training?) are funded. Information on bids needs to be available whether successful or not. Nearly all of this information will be available and processed when appraising the bids before a decision is made. The only additional costs involved arise from doing this in a consistent, well-documented manner and in somehow making this data available. Recording all of this detail would involve a small amount of expenditure but does take time at a point when officials are usually under pressure to make decisions and start spending money. Unfortunately, it is arguable that costs are not the major barrier in terms of data availability. Using policy design to assess causal effects ideally requires government to have detailed information about the decision-making process. How were bids solicited and assessed? How were the winning bids selected? How were funding levels decided? At least in the context with which I am most familiar (the UK), it is remarkable how little of this information is systematically recorded even for internal purposes. I would assume that this problem applies much more widely beyond the UK. Assuming all this information (on the policy process and outcomes) is available, there is one remaining major barrier. Specifically, effective policy evaluation needs the government to make all this information available to researchers. For all kinds of reasons, governments remain reluctant to do this. Of course, a genuine reason for resistance to transparency is that some of the information may be confidential (more so when it relates to individuals or firms than areas). Fortunately, government departments and statistical agencies do appear to be increasingly willing to find mechanisms for circumventing this specific problem. In the UK, for example, they do this by making data “publicly” available to use in a secure data environment with controlled access and detailed disclosure rules (e.g., the ESRC-funded Secure Data Service). Again, there will be some cost to maintaining this data and providing access to it. The final barrier to more careful policy evaluation is that government needs to be patient. To perform the kind of analysis discussed in this section requires data on the policy and for a range of outcome variables, for example, firm performance, employment, and unemployment, for an appropriate number of geographical areas. That outcome data is usually

1338

H. G. Overman

only available with a time lag of several years which complicates the interaction between evaluation and policy formulation (because policy makers are often working on shorter time scales). But once the data becomes available, if the policy design is such as to interest academics, researchers will then spend many (unpaid) hours figuring out whether the policy in question had any causal impact on outcomes. In short, with a little patience and transparency, open evaluation has the scope to significantly increase our understanding of the causal impact of government urban policy at very little (direct) financial cost. In addition, such evaluation can also increase our understanding of how the spatial economy functions. For example, evaluation of place-specific policies can tell us the extent to which other “amenities” are likely to get capitalized into land values. Policy evaluation of transport projects can tell us whether market access (through the transport network) affects productivity. Looking at the impact of training policies can help increase our understanding of local labor markets. The geographical economic literature is only just beginning to explore these issues, but experience from other fields suggest that we might learn a lot more from such an approach.

5

Conclusions

To some extent, this chapter has been concerned with the “process” by which geographical economics influences policies. It has considered several barriers that limit this influence. The chapter has been structured around the three sets of constraints facing academic researchers – specifically in terms of the availability of data, the limitations of spatial analysis, and the role of the evaluation of government policy. But along the way, the chapter has also highlighted several constraints facing policy makers. Policy makers are accountable for the performance of places. This means that they need to be interested in data for administrative units even if they understand that these might not adequately capture how spatial disparities are developing and what, if any, impact policy is having. A lack of analytical capacity often exacerbates the problems caused by any disconnect between the data used for analysis and that used to assess the performance of different administrative units. In terms of the analysis, policy makers often perceive ethical or political problems with decision-making processes, such as randomization or competitive bidding, which many researchers advocate as “ideal” for evaluation purposes. More careful evaluation calls for up-front costs in terms of systematic data collection but only delivers longer-term results once outcome data becomes available and analysis has been undertaken. Political imperatives, for example, an incentive to show short-term results, can easily override the desire of officials to take a longer-term view of the impact of the policies for which they are responsible. Some of these issues stem from fundamental conflicts of interest between researchers and policy makers. Others are more easily addressed. Collecting data for more “sensible” spatial units – such as metropolitan areas – can better align the spatial scales used by policy makers and analysts. Using institutional features of policies to help improve the understanding of the causes of spatial disparities

Geographical Economics and Policy

1339

increases the relevance of academic research to the policy-making community. Secure data services allow governments to share data in a way that maintains some control over exactly how that data is used. In turn, open data allows for open evaluation where the academic community can provide longer-term assessments of the impact of policy even if policy makers’ attention remains focused on the short term. Of course, addressing all these barriers is only a necessary, but not sufficient, step in ensuring that insights from geographical economics help inform spatial policy. Belief- or principle-based policy making still trumps evidenced-based policy making in many situations. But addressing these problems also makes for good geographical economics regardless of any influence on policy. Fortunately, for academic researchers, even if we fail to change the world, improving our understanding of how the world works is hopefully reward enough for our efforts.

6

Cross-References

▶ Agglomeration and Jobs ▶ Endogeneity in Spatial Models ▶ Instrumental Variables/Method of Moments Estimation ▶ Scale, Aggregation, and the Modifiable Areal Unit Problem ▶ Spatial Policy for Growth and Equity: A Forward Look

References Angrist J, Pischke JS (2009) Mostly harmless econometrics. Princeton University Press, Princeton Baldwin R, Forslid R, Martin P, Ottaviano GM, Robert-Nicoud F (2003) Economic geography and public policy. Princeton University Press, Princeton Baum-Snow N, Marion J (2009) The effects of low income housing tax credit developments on neighbourhoods. J Public Econ 93:654–666 Busso M, Gregory J, Kline P (2013) Assessing the incidence and efficiency of a prominent place-based policy. Am Econ Assoc 103(2):897–947 Cheshire P, Magrini S (2009) Urban growth drivers in a Europe of sticky people and implicit boundaries. J Econ Geogr 9:85–115 Combes PP, Duranton G, Overman HG (2005) Agglomeration and the adjustment of the spatial economy. Pap Reg Sci 84:311–349 Dachis B, Duranton G, Turner M (2011) The effects of land transfer taxes on real estate markets: evidence from a natural experiment in Toronto. J Econ Geogr 12:327–354 DiNardo J, Lee DS (2010) Program evaluation and research designs. In: Ashenfelter O, Card D (eds) Handbook of labor economics, vol 4. Elsevier, Amsterdam Duranton G, Overman HG (2005) Testing for localisation using micro geographic data. Rev Econ Stud 72:1077–1106 Duranton G, Gobillon L, Overman HG (2011) Assessing the effects of local taxation using microgeographic data. Econ J 121:1017–1046 Gibbons S, Overman HG, Patacchini E (2015) Spatial methods. Handb Reg Urban Econ 5:115–168 Lee DS, Lemieux T (2010) Regression discontinuity designs in economics. J Econ Lit 48:281–355

1340

H. G. Overman

Martin R, Sunley PJ (2011) The new economic geography and policy relevance. J Econ Geogr 11:357–370 Neumark D, Simpson H (2015) Place based policies. Handb Reg Urban Econ 5:1197–1287 Overman H (2010) “GIS a job”: what use geographical information systems in spatial economics. J Reg Sci 50:165–180 Sacerdote B (2001) Peer effects with random assignment: results for Dartmouth roommates. Q J Econ 116:681–704

New Geography Location Analytics Alan T. Murray

Contents 1 2 3 4 5

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Analytics Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Service Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spatial Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Location Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Transportation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Congestion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Economies of Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1342 1344 1346 1348 1351 1351 1353 1355 1356 1358 1360 1360

Abstract

An important specialty in regional science is location theory supported by a rich set of methods and approaches, referred to more generally as location analytics. This chapter provides an overview of new geography location analytics. Issues to be specifically addressed include congestion, dispersion, and concentration but also market size and production scale. A by-product of these concerns is that transportation costs, both distance and travel time, are fundamental. Of course these issues reflect some of the basic tenets of new economic geography. Underpinning much of location analytics is spatial optimization, where mathematical formalization is relied upon to structure a range of models for use in evaluating, extending, and planning for service system design involving facility location and allocation. This enables derivation and application but also communication of goals, objectives, and constraining conditions. Location decisions, trade/market areas, and routing are critical for making goods and services affordable, feasible, A. T. Murray (*) Department of Geography, University of California at Santa Barbara, Santa Barbara, CA, USA e-mail: [email protected] © Crown 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_144

1341

1342

A. T. Murray

and/or profitable. Yet location analytics are not without challenges, among them dealing with big data, model solution, addressing issues of uncertainty, and understanding trade-offs associated with multiple concerns. Keywords

Utility maximization · Equity · Optimization · Facility siting · ESDA

1

Introduction

The location of goods and services has profound and long-lasting implications for economic performance, viability, and sustainability. Poorly configured service systems may prove too costly or garner an insufficient customer base. The result is firm or service failure, or the need to relocate and reconfigure a system in order to better compete, provided that this is an option. Alternatively, well-designed service systems foster success in many ways, such as by keeping distribution costs at a minimum, enhancing visibility and exposure, providing superior access and accessibility, enabling rapid response, consideration of fairness issues, etc. Each or a combination of these factors contribute to firm or service system sustainability over the long term. In most cases, location decisions, trade/market areas, and routing are at the heart of making goods and services affordable, feasible, and/or profitable. For these and other related reasons, location theory and associated analytics have historically been and continue to be an important part of regional science. New economic geography is effectively an attempt at further generalization of location theory, unifying, integrating, and reconciling important service system characteristics. There is an inherent recognition that a system will self-organize in various ways for a range of reasons. One factor is associated with transportation costs, where access and accessibility are important and congestion degrades service in addition to increasing costs. Another factor is agglomeration in that there is a benefit to proximity of direct and indirect resources, which could include skilled labor, knowledge, innovation, etc. but also increased consumption that emanates from an interdependent system of systems. Economies of scale are a critical reality associated with decreased per unit service costs that arise when larger systems are able to produce at a higher rate. This is possible through a number of mechanisms, particularly those associated with manufacturing and industrial processes, but also due to specialization when greater efficiency and better service are achieved through more focused and refined efforts. Inherently important then is critical mass but beyond this the emergence of decidedly spatial structure in the form of urban hierarchies and core-periphery relationships. Two important observations regarding new economic geography are that spatial specificity and detail are generally lacking as well as the primary focus on the more narrow emphasis of industry/firm behavior. Location analytics reflect the methods, tools, and techniques that support evaluation and decision-making regarding the siting and configuration of service systems. They can be thought of as a bridge between theory and application, but with specific

New Geography Location Analytics

1343

attention to spatial specificity and detail. Service systems are virtually everywhere, including the private sector and the public sector. For example, Target is a major private sector retailer in the United States with nearly 1900 stores, 40 distribution centers, and more than 12 global sourcing offices. Providing good access to customers is essential for Target’s success, but so is minimizing distribution costs in order to keep products affordable. The location of stores and supporting distribution centers is critical for such a service system as they have major cost implications. A public sector example of a service system is the San Diego Fire-Rescue Department (California) with 52 fire stations and 40 lifeguard stations (9 permanent and 31 seasonal). During the last year (2018), they responded to 159,590 service requests. Essential in this case is quick response by personnel, within minutes of a call for service. Fire response seeks to save lives and property, the latter often associated with extinguishing flames before a structure reaches flashover, beyond which point it is likely a total loss. Medical and lifesaving efforts too require immediate response in order to increase survival odds. In these cases, minutes and seconds can be the difference between life and death. These public and private sector examples reflect differing geographic scales, with Target spanning the global and/or continent and San Diego Fire-Rescue Department a regional focus, but service systems may be rather micro, such as street lighting in a neighborhood or wireless coverage in a building. Critical location decisions include where to site associated facilities as well as deriving associated service areas along with routing and transportation planning. The facilities may be stores, distribution centers, fire departments, lifeguard stations, street lights, wireless access devices, etc. This must be coupled with service area decisions that are dependent on spatiotemporal distance/travel dynamics. All are interdependent locational decisions that impact system performance and overall costs. The use of location analytics may be oriented toward understanding how an existing service system is performing, or their use may be to plan/design a new system. The criteria supporting analysis may consist of one or more utility functions associated with consideration of profits, costs, access, equity, etc. Additionally, there may be a range of constraining factors, perhaps having to do with budget, geographic configuration, risk, vulnerability, etc. Locational decisions may involve a single facility, multiple facilities, hierarchical structure, interaction, competition, etc. Beyond this, facilities may correspond to a point entity (e.g., emergency warning siren), line entity (e.g., electricity transmission corridor), or area entity (e.g., nature reserve), with options for potential locations being unlimited (continuous space) or finite (discrete space). Depending on the nature of the service system, such as the examples noted above, there are simultaneous decisions involving the location of facilities combined with allocation of demand reflecting routing and/or transportation costs. This chapter reviews location analytics in a broad regional science context. Issues associated with transportation, congestion, dispersion, concentration, market size, and production scale are touched upon as basic tenets of new economic geography. The remainder of the chapter is structured in the following manner. The next section provides an overview of analytics. Then, service allocation associated with market

1344

A. T. Murray

and trade areas is reviewed. Elements and characteristics of spatial optimization are then detailed, including basic structure as well as solution. A review of location models that address a range of new geography issues is given. The chapter ends with conclusions, references, and recommended readings.

2

Analytics Overview

There are of course a range of analytical methods that have been relied upon to support service system evaluation, planning, and management efforts. Most of these methods have a long history of development and technical advancement, along with being applied in host contexts. In regional science Murray (2017) identifies a number of analytics that have been utilized, including GIS (geographic information systems), remote sensing, measures/metrics, statistics, optimization, economic, geostatistics, spatial statistics, and geosimulation, among others. Figure 1 provides an illustrative view of regional science analytics, suggesting that any study, analysis, and/or planning effort would involve a combination of these analytical methods used in an integrated fashion. While the intent of this chapter is not to provide comprehensive specification of all analytics used in regional science, a brief overview of the primary methods is offered here. The exception to this is the more detailed discussion of optimization, and location modeling, given later as location analytics are the primary emphasis of the chapter. Information collection and data management approaches involving the use of remote sensing, GPS (global positioning system), and GIS have clearly played an increasingly important role in regional science. Remote sensing involves

Measures / metrics Remote sensing

GIS

Statistics

Problem context

Optimization

Economic

Geo-simulation Geostatistics / spatial statistics

Fig. 1 Spatial analytics commonly used in regional science

New Geography Location Analytics

1345

the use of sensors on satellites, aircraft, unmanned aerial vehicles, etc. to produce digital readings for an area that may then be interpreted in some manner. Aerial photos are common, but there are many types of sensors that produce unique signatures for encountered materials or conditions on/near the surface of the Earth, such as grass, trees, farm crops, water, asphalt, concrete, etc. GPS is a collection of orbiting satellites as well as supporting control base stations that can be used by a receiver to identify positional location on the surface of the Earth. This provides time and geographic reference for objects and sites across a region as well as enables tracking of travel routes and movement behavior over space and time. Such data may be subsequently processed, managed, and analyzed in GIS. Accordingly, GIS is a collection of methods to help create/acquire, manage, manipulate, analyze, and display geographic data, with supporting data coming from external sources amenable for input and processing or generated using manual and automated approaches. A mainstay in analytics is statistical methods. Statistics are essentially approaches for collecting, classifying, presenting, and analyzing data. This includes traditional descriptors like mean, median, variance, etc. but also other measures and metrics. Statistics provide an important analytical foundation, particularly with respect to probability, sampling, goodness of fit, and structured approaches for hypothesis testing (e.g., estimation, inference, etc.). Parametric and nonparametric approaches for assessing basic correlation, analysis of variance, and regression are widely relied upon as well as factor analysis and principle components. Dealing more explicitly with geographic information, geostatistics, and spatial statistics have emerged as unique analytics for addressing the assessment of point patterns and spatial autocorrelation, but also carrying out spatial regression, spatial interpolation, as well as spatial clustering, recognizing the technical and practical challenges of dealing with spatiotemporal information. Often viewed as the economic area of regional science, supporting analytics include approaches like the social accounting matrix, inputoutput, and computable general equilibrium. The intent of these analytics is essentially to examine sectoral relationships and dependencies; assess trade between cities, regions, and countries; and evaluate uncertainties along with policy implications. A final analytic of note in regional science is geosimulation. Associated analytic approaches include cellular automata and agent-based models. These approaches often mimic urban and regional processes, particularly mechanisms of land use change and development. Much contemporary excitement and discussion associated with analytics focuses on artificial intelligence, machine learning, deep learning, etc. These are interrelated and/or interdependent approaches oriented toward imitation of intelligent human behavior using computers. Of course, the use of algorithms of various sorts to fit, classify, and solve has existed for decades in the quest for knowledge discovery. Examples include simulated annealing, evolutionary (genetic) algorithms, neural networks, k-means, decision trees, etc. and are nothing particularly new or novel, but most certainly they are finding new use and utility in big data environments. Noteworthy have been the application for supervised and unsupervised learning in image processing and classification.

1346

A. T. Murray

A potentially meaningful way to view individual or integrated analytics in Fig. 1 is that they support exploratory spatial data analysis (ESDA). The use of statistical and other methods along with graphical summaries of any sort is common to exploratory processes, but the focus on distinguishing characteristics of geographic data, spatial dependency, and heterogeneity makes ESDA unique in many ways. Often used in the process of detecting outliers, patterns, and trends as well as developing hypotheses, an expanded perspective, like that of Fig. 1, involves a host of other methods for examination. There has been considerable use of ESDA in regional science to address urban and rural development issues (Murray 2010a). An important distinction in the use and application of analytics is context. In particular, the notion of descriptive and prescriptive purpose is a salient perspective. As suggested by the name, descriptive has to do with the use of analytics to characterize and explain. This has been a characteristic trait of location theory, the ability to describe processes and behavior having to do with land use, economic activity, competition, etc. Descriptive analytics may be used as well to predict what will happen given a particular situation or set of circumstances. In contrast, prescriptive analytics are oriented toward identifying a path for achieving a desired outcome. Examples include achievement of sustainability goals and ensuring system efficiency and necessarily dictate that specific prescribed actions are needed to establish or alter a system in order to reach identified goals and objectives.

3

Service Allocation

An important aspect of any service system is the allocation of demand (or customers) to facilities, at or from which the service originates. This is critical because often a significant component of costs is travel distance but also typically involves access time which invariably has a cost component as well. Poorly allocated demand to facilities means that costs will likely be high and access and response times longer than necessary. Further, poor access and accessibility degrades service quality and is a competitive disadvantage. Alternatively, a good demand allocation scheme represents one aspect of efficient operation of a system, enhances service quality, facilitates rapid response, etc. A formal definition of allocation then is the determination of which demand are served by which facility. In a descriptive sense, one might be interested in where customers are coming from in order to improve service or devise strategies for attracting additional customers. Alternatively, a prescriptive perspective would be deciding the best assignment scheme to serve customers. Both contexts are reflected in notions of trade/market area, which is the region or zone surrounding a given facility constituting the customers to be served. A considerable emphasis in location analytics is associated with descriptive and prescriptive aspects of service allocation. On the descriptive side, approaches like customer spotting (e.g., ad hoc boundary delineation) and the analog approach (e.g., regression) are commonplace but also are methods such as the gravity model and the Huff model. Alternatively, prescriptive approaches include things like the

New Geography Location Analytics

1347

Facility Allocation boundary

Fig. 2 Service allocation derived using the Voronoi diagram

transportation problem and Voronoi-based methods. Figure 2 illustrates a Voronoi diagram for the given set of facilities and indicates that those customers within a polygon would be allocated to the facility in their associated polygon. This process can be formally defined as follows. Given a region R and a set of service facilities, then an individual facility j, j  {1, 2, . . ., m}, can be geographically referenced by coordinates (λj, φj) if it is assumed to be a point object. Let sj be the service allocation for facility j represented as a polygon. The Voronoi diagram satisfies the following conditions: [ sj ¼ R

ð1Þ

j

sj

\

sk ¼ ∅

ð2Þ

where the individual Voronoi polygon associated with each facility is defined as: n o sj ¼ ðλ, φÞ  Rjdðλ,φÞ,ðλj ,φj Þ  dðλ,φÞ,ðλk ,φk Þ 8kðk 6¼ jÞ

ð3Þ

Equation (1) indicates that the union of all service allocation polygons equals the entire region. That is, the region is partitioned into discrete service areas, Eq. (2) noting that polygons are non-overlapping. The polygon specification in (3) requires that any location (λ, φ) is associated with, or allocated to, its closest facility as measured using dðλ,φÞ,ðλj ,φj Þ . This is often Euclidean distance: dðλ,φÞ,ðλj ,φj Þ ¼

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2  2 λ  λj þ φ  φ j

ð4Þ

1348

A. T. Murray

The Voronoi diagram for the indicated facilities in Fig. 2 illustrates the associated allocations. There are many associated assumptions with this particular Voronoi diagram. In particular homogeneity in travel reflected in Euclidean distance. Recent research in this area has focused on extensions of the Voronoi diagram to reflect heterogeneity in travel, accounting for issues of congestion, travel time, etc. Interestingly, the use of travel time may result in non-convex and/or noncontiguous service areas (polygons) and creates computational challenges in diagram derivation.

4

Spatial Optimization

Among the early efforts in regional science that recognized the importance of the then emerging field of optimization was the work by Isard (1958). A wealth of research and interest has followed. The emphasis on spatial here denotes the significance of geography, where explicit and implicit location and proximity relationships play an important role in model specification, application, and solution, as well as the evaluation and decision-making context more generally. Church (2001) discusses that the pioneers of location theory were effectively grounded in notions spatial optimization, with the work of Johann von Thunen in the 1800s seeking out an optimal arrangement of wood lots, the work of Alfred Weber in the early 1900s wanting to find the optimal location of a manufacturing facility, and the work of Walter Christaller in the 1930s asking questions about the optimal arrangements of towns. A definition of spatial optimization is that it is the science of optimal arrangement according to Church (2001). Murray (2017) suggests that it is formalization and solution of an inherently geographic problem involving decisions, criteria differentiating solution quality, and constraints on decision-making. What makes a spatial optimization problem interesting and unique is that the decisions are typically geographic, such as where to site facilities, what routes to take, which allocations to make, etc. This means that any or all aspects of an optimization model may be spatial in some manner, including but not limited to decision variables, objectives, constraints, coefficients, relationships, etc. Consider a generic optimization model structured in the following manner: Minimize f ðX Þ

ð5Þ

Subject to : gi ðX Þ  bi 8i ¼ 1, . . . , n

ð6Þ

where X is a vector of decision variables, X1, X2, X3, . . ., Xm, f( ) is a function, and gi( ) is function i. Note that these functions may be linear or nonlinear in form with respect to the decision variables, the unknowns in the model. Structured in Eq. (5) is an objective that is to be optimized through the selection of decision variable values, with the orientation of minimization in this case. The best value of this objective

New Geography Location Analytics

1349

depends on permissible values of decision variables along with the nature of the function, f( ). In order for evaluation (and optimization) to occur, this function must have an unambiguous specification consisting of coefficients, mathematical operations, and decision variables. Add to this is that decision variable options are generally restricted by models constraints, Eq. (6). The function gi( ) must also have an unambiguous specification consisting of coefficients, mathematical operations, and decision variables, and this is true for each function i. Of course, what makes a model like (5) and (6) spatial are considerations associated with decision variables, coefficients, sets, objectives, and/or constraints. As noted previously in this chapter, spatial decisions include things like where activities should or could take place, for example, the siting of stores, distribution centers, fire departments, lifeguard stations, street lights, wireless access devices, etc. This could also involve making decisions regarding interaction and allocation as well as routing. In a function the associated coefficients may well be spatial as well. For example, it is not uncommon to see distance or travel time in the objective or constraints of a spatial optimization model. There could also be sets specifying use, relationships, or conditions that are spatial. For example, one may know spatial structure through sets representing neighbors or adjacency between geographic objects/entities. This is similarly true for network representations in order to track incoming and outgoing links incident to nodes. The objective(s) of a spatial optimization model may also be geographic in nature. For example, the objective may derive distance or travel time, perhaps making use of Eq. (3). Finally, the constraints of a spatial optimization model could be geographic through the imposition of connectivity or contiguity requirements, as an example. In the above ways, and others, models may be conceived and structured to explicitly and implicitly reflect geographic relationships, making them spatial optimization problems. In order to demonstrate the characteristics of spatial optimization, notation is first introduced. Consider the following: i = index of customers (1, . . ., n) j = index of facilities (1, . . ., m) cij = cost of shipping one unit of goods from facility j to customer i δi = total goods required by customer i σ j = total goods produced at facility j The necessary decision variables, unknowns, are the following: Zij = amount of goods to be transported from facility j to customer i Given this notation, a prominent spatial optimization approach for supporting allocation decisions is the classic transportation problem. It may be formulated as follows:

1350

A. T. Murray

Minimize

n X m X i¼1

Subject to :

m X

cij Zij

ð7Þ

j¼1

Zij  δi 8i ¼ 1, . . . , n

ð8Þ

j¼1 n X

Z ij  σ j 8j ¼ 1, . . . , m

ð9Þ

i¼1

Z ij  0 8i ¼ 1, . . . , n and j ¼ 1, . . . , m

ð10Þ

The objective, (7), seeks a solution of minimum total transportation cost. It is spatial in two ways. Firstly, the costs are inherently spatial as they are a function of distance, where cij = f(dij) assuming that dij is the distance from facility j to customer i, consistent with its usage above. In this case the function in (7) is linear with respect to the decision variables. Secondly, the decision variables, Zij, correspond to a spatial allocation of goods and/or services, ultimately establishing service/trade/market areas. Constraints (8) require that demand needs for each customer i, δi, be satisfied. Constraints (9) stipulate that supply from each facility j does not exceed its capacity, σ j. Non-negativity conditions are established in constraints (10). Collectively, given that customers and facilities are located on the surface of the earth, this is inherently a spatial optimization problem. Again, optimization models are useful in a variety of contexts, ranging from describing an existing system and benchmarking performance characteristics to prescriptive service system design where new facilities will be established that best meet the needs of demand for service. The model is of course one aspect of the problem. For a model to be of any use, it must be possible to solve it in some way. Depending on the formulation and structure of the associated functions, there are three basic types of models: linear program, integer program, and nonlinear program. Of course, there are other nuances and defining conditions, but this is beyond the scope of this chapter. Of more significance, however, are the basic approaches for solving these models. The two distinctions to be highlighted here are exact and heuristic methods. An exact method denotes that under satisfied conditions, the approach will identify the optimal, or best, solution. For a linear programming problem, an example of an exact approach is the simplex method. An exact solution approach for an integer programming problem is the simplex method coupled with branch and bound. A nonlinear programming problem can be approached in a number of ways, but an exact method generally would involve the evaluation of partial derivatives and regularity conditions, the so-called Karush-Kuhn-Tucker conditions. Approaches for which an optimal solution is not guaranteed are heuristics. These are typically ad hoc, rules of thumb, or analogy-based methods. They might produce good solutions, and they may be computationally efficient, but in the end they cannot prove that an identified solution(s) is optimal in any way. As is evident throughout

New Geography Location Analytics

1351

this book, optimization permeates regional science, with many chapters highlighting structure, usage, application, utility, nuances, etc. of optimization but also the significance of geographic space as well as the challenges faced in practice. For these reasons, this section specifically emphasized spatial optimization and what makes it unique and different. The section that follows further demonstrates the importance of spatial optimization through explicit location and allocation decisionmaking.

5

Location Modeling

As noted previously, location analytics reflect a broad set of methods, tools, and techniques for evaluation and decision-making associated with the siting and configuration of service systems. Prominent has been location models that reflect theoretical and application-oriented considerations, often structured in mathematical terms and solved using exact and heuristic approaches. Overviews of location models can be found in Hurter and Martinich (1989), Church and Murray (2009) and Murray (2010b) but others as well. As approached in this chapter, location models are often defined along the lines suggested in Eqs. (5) and (6), where the decisions to be made are explicit and placed in the context of one or more utility functions representing objectives along with any constraining conditions in the form of equalities or inequalities. Such models then are a form of communication, making clear what is being structured, assumptions, preferences, relationships, etc. Again, worth reiterating is that location models can be used in both descriptive and prescriptive ways, depending on context and study orientation. In addition to addressing fundamentals of new economic geography associated with transportation, congestion, dispersion, concentration, and scale, the intent of this chapter is to review location models that distinguish continuous and discrete space approaches, doing so through the use of functions that are linear and nonlinear with respect to decision variables.

5.1

Transportation

A popular and widely utilized location model is that attributed to Alfred Weber, as noted previously, seeking the optimal location of a manufacturing facility based on explicit consideration of transportation costs. Church (2019) notes that this basic problem has been extended in a number of ways but also is open to reinterpretation of original intent and utility. The focus in this section is the multiple facility context of this approach, therefore requiring decisions on customer/demand allocations as well as facility locations. This is known as the multifacility Weber locationallocation problem and is illustrated in Fig. 3 (for three facilities). Consider the following notation:

1352

A. T. Murray

Selected facility Demand Service allocation

Fig. 3 Multifacility location and allocation

i = index of demand (1, . . ., n) j = index of facilities (1, . . ., p) δi = amount of material to be shipped to/from demand i (λi, φi) = coordinates of demand i p = number of facilities to be located The necessary decision variables, unknowns, are the following: (

1 if demand i is to be served=allocated to facility j 0 otherwise (Λj, Φj) = location of facility j Zij ¼

Given this notation, specification of the multifacility Weber location-allocation model is as follows: Minimize

p n X X i¼1

δi Zij

j¼1

Subject to :

p X

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2  2 λ i þ Λ j þ φi þ Φ j

Zij ¼ 1 8i ¼ 1, . . . , n

ð11Þ

ð12Þ

j¼1

Z ij ¼ f0, 1g 8i ¼ 1, . . . , n & j ¼ 1, . . . , p

ð13Þ

The objective, (11), is oriented toward the least cost allocation of demand to sited facilities. Accordingly, objective (11) is nonlinear, in part due to location decision variables Λj and Φj under the radical function but also because this radical function is multiplied by allocation decision variables Zij. Constraints (12) indicate that each demand i must be allocated to exactly one facility. Binary restrictions on allocation

New Geography Location Analytics

1353

decision variables are specified in constraints (13). Note that no constraint is given for decision variables Λj and Φj. The reason for this is that they are unrestricted in value, so they may be positive or negative. The optimal values (locations) will depend on the spatial distribution of demand and the underlying reference system. Worth further discussion is that objective (11) reflects costs as a function of distance, Euclidean in this case. Of course other forms of distance or travel time could be incorporated. Further, the model could be extended in various ways, depending on application context, through the use of additional constraints, parameterization, distance metrics, etc. An important issue and consideration with applying the multifacility Weber location-allocation model in practice is solution. It continues to be the focus of research efforts because it is extremely difficult to solve, in terms of both computational effort and solution quality. In general terms, heuristics are often used for solution of this model, with the alternating procedure being a common approach given that it is relatively efficient and identifies high-quality results. Other heuristics for solving this model include the projection method, gradient search, variable neighborhood search, and metaheuristics like genetic algorithms. While limited in potential application, exact methods have been developed based on transformation approaches.

5.2

Congestion

The impacts of congestion in service systems are significant. Infrastructure is invariably limited in various ways, so accounting for this explicitly is often important in service system design. For example, an imbalance in facility workloads can be inequitable but also may degrade service quality, both potentially resulting in marginal economic returns due to performance and other issues. An approach for addressing congestion in location modeling has involved constraining service provided by any one facility. In doing so, this can help balance facility workloads, ensuring that no facilities are underutilized and no facilities are overutilized. A range of location models have been developed to account for congestion. Consider first the following notation: i = index of customers (1, . . ., n) j = index of potential facility sites (1, . . ., m) cij = cost of shipping one unit of goods from facility j to customer i γ j = opening/fixed cost for a facility at j σ j = capacity (total goods produced) of facility j δi = demand desired by customer i Decision variables, unknowns, to be used are the following: Zij = fraction of demand supplied to customer i from facility j

1354

A. T. Murray

( Xj ¼

1 if facility j is located 0 otherwise

The notation has changed slightly, with the allocation decision variable now representing the fraction of demand served by a facility. In contrast to the continuous space multifacility Weber location-allocation model, it is now assumed that the potential facility sites are fixed and finite in number. The index j therefore references potential facility sites that have been identified for consideration. A discrete location model to account for fixed costs in establishing a facility as well as congestion through workload capacities is now reviewed. This is commonly referred to as the capacitated plant location model: Minimize

XX i

δi cij Z ij þ

j

Subject to :

X j

X Z ij ¼ 1 8i

γ j Xj

ð14Þ ð15Þ

j

X δi Z ij  σ j Xj 8j

ð16Þ

i

0  Z ij  1 8i, j

ð17Þ

Xj ¼ f0, 1g 8j

ð18Þ

Objective (14) is comprised of two components. One is the transportation/shipping costs associated with allocations of demand to/from a facility. The second component is the fixed costs to establish a facility. Constraints (15) indicate that each demand must be allocated to exactly one facility. Constraints (16) are capacity constraints, which effectively limit the maximum demand allocation to any one facility. These are necessary, as discussed previously, for dealing with congestion issues. Constraints (17) bound allocation decision variables to be in the range of zero to one. Binary restrictions on location decision variables are specified in constraints (18). The capacitated plant location problem is similar in many ways to a variety of location models. It is a natural extension of the transportation problem, (7), (8), (9), and (10), detailed previously, where allocation is scaled to be a fraction, and there are now decisions regarding which facilities to include in the service system. Worth noting is that costs could be extended in various ways, perhaps to account for things like volume expansion, etc. To address this, costs cij would be modified accordingly, resulting in costs c0ij that can be substituted into objective (14). Again, congestion issues are being addressed through the use of capacity constraints. Fractional assignments will likely arise in some instances when the maximum threshold of service at a facility is reached, reflecting partial demand being served by two or more sited facilities in order to satisfy need. The reason for this is insufficient capacity to

New Geography Location Analytics

1355

serve all demand by a single facility. As a result, there may necessarily be instances where some or all demand is allocated to a facility that is further away than the closest, or least cost, option. Because of the capacity restrictions, this model is often challenging to solve. Given the integer requirements for location decision variables, Xj, considerable branching and bounding may be necessary in practice, if an exact approach like linear programming with branch and bound is utilized. A range of heuristics too have been developed and applied for solution, including greedy adding, greedy drop, alternating, vertex substitution/interchange, Lagrangian relaxation, and metaheuristics like Tabu search, among others.

5.3

Dispersion

The notion of dispersion in location modeling has to do with the recognition that there are negative externalities associated with the concentration of certain types of services and associated facilities. As an example, multiple facilities offering the same service in the same block/neighborhood of a city may have a number of unintended consequences. Recycling facilities may produce excessive noise and odor. Group homes provide treatment for addiction, life skills, socialization, etc., often to individuals that have been mandated by the judicial system to undergo treatment. When such services are concentrated, this increases the likelihood of local problems, issues, hazards, etc. Concentrated local impacts along these lines may decrease property values, increase service costs, create community backlash, etc. As result, an important area of modeling focuses on location and allocation when facilities are to be dispersed in some manner. Figure 4 contrasts dispersed and concentrated service facilities. There are a number of modeling approaches for supporting dispersion decisions. Consider the following notation: j = index of facilities (1, . . ., p) R = region of analysis (λ, φ) = reference to a location in R, (λ, φ)  R p = number of facilities to be located The necessary decision variables, unknowns, are the following: (Λj, Φj) = location of facility j The location model that follows is referred to as the continuous space p-center problem:  min ð Λj , Φ j Þ  R

 max

ðλ, φÞ  R

 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2  2 min λ  Λj þ φ  Φj j

ð19Þ

1356

A. T. Murray

(a) Dispersed

(b) Concentrated

Fig. 4 Dispersed and concentrated service facilities. (a) Dispersed. (b) Concentrated

The objective, (19), indicates a goal of minimizing the maximum distance of demand to its closest facility. Distance in this case is the Euclidean metric detailed previously, (4). The model is nonlinear because of decision variables under the radical. Of note is that there are no constraining conditions per se, though the facilities are restricted to be in the study region in objective (19). Comparatively, it is important to note that this model is a continuous space problem in the sense that facilities may be located anywhere in the region. The solution of the continuous space p-center model has involved both heuristic and exact methods because it is extremely difficult solve. Common heuristics include add, drop, and swap approaches. The Voronoi diagram, (1), (2), and (3), has been used as part of a heuristic process and has proven particularly effective. Other approaches include relaxation, perturbation, and neighborhood search. Under certain conditions for relatively small problem instances, exact approaches have been successful.

5.4

Economies of Scale

The last location model to be discussed in this chapter addresses issues associated with economies of scale. Hubs are special types of facilities where important activities take place, such as switching, sorting, connecting, consolidation, or break-bulk, in the course of travel between origins and destinations. There are significant benefits offered by hub-based service systems, but most prominent are reduced per unit travel costs (economies of scale) associated with demand consolidation between hubs. This is possible in a variety of ways, primarily through better system efficiency but also better service response. There are obviously many examples of hub-based service systems, but air travel and courier delivery likely are the most prominent. The intent of such a service system is therefore to locate/establish hub facilities and allocate demand to hubs such that the greatest efficiency is achieved. Distinguishing characteristics of a hub system include that demand is based on origin-destination

New Geography Location Analytics

1357

Selected hub facility Potential facility (not selected) Demand Service allocation Hub interaction

Fig. 5 Hub-based facility service system

travel and travel goes through hubs (Alumur 2019). This is illustrated in Fig. 5. Consider the following notation: i (also l) = index of customers (1, . . ., n) j (also k) = index of hub facilities (1, . . ., m) wil = demand originating at i destined for l cij = cost of shipping one unit of goods from hub facility j to customer i cjk = cost of shipping one unit of goods between hub facilities j and k α = discount factor accounting for scale effects p = number of hub facilities to be located Decision variables, unknowns, to be used are the following: Zij = amount to be shipped from/to customer i to hub facility j ( 1 if hub facility j is located Xj ¼ 0 otherwise It is important to note that the economies of scale parameter is α, enabling cost savings through higher volumes of activity. Given this notation, it is possible to formulate the single allocation p-hub median model as follows: Minimize

XXXX   wil cij Z ij þ αcjk Zij Zlk þ clk Zlk i

l

j

ð20Þ

k

Subject to :

X Z ij ¼ 1 8i

ð21Þ

j

X j

Xj ¼ p 8j

ð22Þ

1358

A. T. Murray

Z ij  Xj 8i, j

ð23Þ

Zij ¼ f0, 1g 8i, j

ð24Þ

Xj ¼ f0, 1g 8j

ð25Þ

Objective, (20), seeks to minimize total costs in demand collection from their origin, inter-hub transfer, and delivery of demand to their intended destination. The economies of scale factor, α, is used in (20) for allocation through two hubs. Constraints (21) specify that demand originating at i is to be allocated to a hub. Constraints (22) indicate that p hubs are to be located. Constraints (23) limit allocation to only hubs that have been located. Binary requirements are established in constraints (24) and (25). The objective is nonlinear in this case. However, it is possible to linearized it in practice through the introduction of additional variables. While only the single allocation version is detailed here, it is possible to extend this in various ways, including multiple allocation, center, coverage, capacity, continuous space, etc. considerations. Hub system design models are generally challenging to solve. As a result, research continues on efficient and effective solution approaches. Exact methods based on linear programming with branch and bound are common, though problem sizes that can be addressed are extremely limited, in general. Accordingly, heuristic methods have been essential, with exchange, Tabu search, simulated annealing, GRASP, Lagrangian relaxation, and neural network approaches, among others, proving effective in various planning contexts.

6

Conclusions

This chapter reviewed location analytics as a component of regional science. Mentioned early in the chapter was that new economic geography attempts to generalize location theory. A number of characteristics of new economic geography were noted, including self-organization, access and accessibility, agglomeration, increasing returns, specialization, critical mass, and urban hierarchies and core-periphery relationships. Further, transportation, congestion, dispersion, and economies of scale in relation to new economic geography were specifically reviewed, with location models detailed that explicitly incorporate aspects of such considerations within structured approaches. Location analytics are therefore a bridge between theory and application. Further, location analytics reflect attention to spatial specificity and detail in service system performance, evaluation, and design. This is significant because self-organization is the by-product of microscale behavior and actions, necessitating the use of models and modeling approaches that reflect increased understanding of processes and interactions.

New Geography Location Analytics

1359

An important contribution was consideration of location analytics as part of the more encompassing analytics trend, but also the ways in which service allocation as well as location decision-making can be approached through the use of spatial optimization. The broad range of service systems addressed using location analytics is noteworthy because it spans both private and public contexts, expanding beyond the consideration of industry and firm behavior to include reconciliation of issues equity in public sector contexts as well. The ability to address issues of sustainability, expenditure of limited public funds, emergency response, public safety, etc. is highly dependent on sound approaches for evaluation and decision-making. Such contexts are difficult to reconcile because associated goals and objectives are often conflicting and competing. Given space limitations, it was not possible to discuss and formulate the broader range of location models that have been developed, nor extensions and more nuanced approaches that have been considered. It is hoped that the included sample of models highlights the emphasis on descriptive and prescriptive context possible along with planning that involves location and allocation choices/evaluation. The chapter also omitted discussion associated with challenges faced in the application of location analytics. Firstly, GIS and remote sensing highlight the prevalence of spatial information, generated at increasing rates across the globe. This is the essence of big data, with volumes that exceed the capabilities of traditional approaches of analysis and summary. Analytics are certainly valuable for trying to understand such data, but invariably face practical limits in many cases. Certainly this is true for location analytics, and to date it is fair to say that capabilities for dealing with big spatial data are rather limited. Secondly, an interesting aspect of location modeling is solution, as noted previously. It is interesting not only because exact and heuristic solution remain challenging for many problems, with continued interest in greater efficiency and better effectiveness, but also that many approaches themselves can constitute big data because of the intermediate data, processing, and/or storage needed. For example, exact solution using linear programming with branch and bound necessitates iterative processing when branching, growing at significant rates for some problem applications. Many heuristics too engage in processes that result in big data. This will continue to be an issue going forward. Thirdly, addressing issues of uncertainty in spatial information that serves as an input for location analytics has only been done in limited ways, with little general understanding of impacts. While the modifiable areal unit problem has seen much attention in location analytics, it is limited to a relatively small number of modeling contexts. Beyond this, there has been little systematic exploration, summary, or understanding of the implications that data uncertainty has in obtained results, nor investigation of the stability of optimal results under questionable conditions. Finally, understanding trade-offs associated with multiple concerns is limited. Collectively then, the field of location analytics within regional science remains an active and vibrant domain, with much potential for meaningful and sustained contributions going forward.

1360

7

A. T. Murray

Cross-References

▶ Cellular Automata and Agent-Based Models ▶ Classical Contributions: Von Thünen and Weber ▶ Cross-Section Spatial Regression Models ▶ Geographic Information Science ▶ Geospatial Analysis and Geocomputation: Concepts and Modeling Tools ▶ Interregional Input-Output Models ▶ Market Areas and Competing Firms: History in Perspective ▶ Network Equilibrium Models for Urban Transport ▶ Spatial Dynamics and Space-Time Data Analysis

References Alumur SA (2019) Hub location and related models. In: Eiselt H, Marianov V (eds) Contributions to location analysis. Springer, New York, pp 237–252 Church RL (2001) Spatial optimization models. In: Smelser N, Baltes P (eds) International encyclopedia of the social & behavioral sciences. Elsevier, Amsterdam, pp 811–818 Church RL (2019) Understanding the Weber location paradigm. In: Eiselt H, Marianov V (eds) Contributions to location analysis. Springer, New York, pp 69–88 Church RL, Murray AT (2009) Business site selection, location analysis, and GIS. Wiley, Hoboken Hurter AP, Martinich JS (1989) Facility location and the theory of production. Kluwer Academic, Boston Isard W (1958) Interregional linear programming: an elementary presentation and a general model. J Reg Sci 1(1):1–59 Murray AT (2010a) Quantitative geography. J Reg Sci 50(1):143–163 Murray AT (2010b) Advances in location modeling: GIS linkages and contributions. J Geogr Syst 12(3):335–354 Murray AT (2017) Regional analytics. Ann Reg Sci 59(1):1–13

Further Reading Church RL, Murray A (2018) Location covering models: history, applications and advancements. Springer, Cham Daskin MS (2013) Network and discrete location: models, algorithms, and applications, 2nd edn. Wiley, Hoboken Drezner Z, Hamacher HW (eds) (2002) Facility location: applications and theory. Springer, Berlin/New York Love RF, Morris JG, Wesolowsky GO (1988) Facilities location: models and methods. North Holland, New York/Amsterdam ReVelle C (1987) Urban public facility location. In: Mills ES (ed) Handbook of regional and urban economics, vol II. Elsevier, Amsterdam, pp 1053–1096

Geography and Entrepreneurship Martin Andersson and Johan P. Larsson

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Why Study the Geography of Entrepreneurship? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Entrepreneurs and Their Support Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Entrepreneurship: Individual Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Entrepreneurship: Demand Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Entrepreneurship: Supply Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Social but Real: Local Informal Institutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Local Formal Institutions: Permits and Politics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 The “Correct” Spatial Level of Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1362 1363 1364 1365 1365 1366 1367 1369 1369 1370 1371 1372

Abstract

We take selective stock of the literature on the geography of entrepreneurship. This literature has been focused on rates of new firm formation, whereas studies of the geography of “high-impact” entrepreneurship are still scant. Entrepreneurship may be described as a “regional event” by the fact that entrepreneurs derive necessary resources from local markets, but also from regional spillovers, social networks, and engrained factors of the local industrial geography. We point to a few fields of analysis where causal or more detailed analyses are warranted.

M. Andersson Department of Industrial Economics, Blekinge Institute of Technology, Karlskrona, Sweden e-mail: [email protected] J. P. Larsson (*) Department of Land Economy, University of Cambridge, Cambridge, UK e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_145

1361

1362

M. Andersson and J. P. Larsson

Keywords

Entrepreneurship · Regional development · Geography of entrepreneurship

1

Introduction

Perhaps the only thing ubiquitous about entrepreneurship is its great variability. The renewal frequency of new firms and their growth ambitions, the “quality” of existing entrepreneurship, and many aspects of the entrepreneurs themselves vary among groups of people and hence across the map. Economic geographers and regional scientists have devoted much attention to better understand the relationship between local conditions and entrepreneurship. The main empirical regularity that has been subject to investigation in this literature is the sources of spatial variations in local rates of new firm formation (Fritsch and Storey 2014). The link between entrepreneurship and growth is generally well-established. Entrepreneurship is also typically seen as an important mechanism by which the economy introduces novelty and experiment with new technology, knowledge, and solutions (Kerr et al. 2014; Lindholm-Dahlstrand et al. 2019). Because our economies are renewing and changing at a fast rate and we face significant challenges, such as climate change, the agents that bring about structural change and develop new solutions are more relevant than they have ever been (Wennekers and Thurik 1999). At the local level, a large literature analyzes the effect of local rates of new firm formation on regional growth and development. A growing body of evidence now supports the view that a high rate of local new firm formation boosts growth and development (Glaeser et al. 2015; Fritsch 2013). The evidence is also increasingly causal: entrepreneurs do not just follow growth – they lead it (e.g., Glaeser et al. 2015; Lee 2017). That is, if some places exhibit consistently higher start-up rates over a sustained time period, we should all else equal also expect those places to pull ahead in terms of growth rates and development. Our understanding of the quality of entrepreneurship has improved as well. A cornerstone of the Global Entrepreneurship Monitor (GEM) project, for instance, is the ambition to give researchers tools to analyze different types of entrepreneurship. This perspective has to date not been sufficiently extended to regional science and economic geography. A main task for economic geography in the twenty-first century should be to elucidate determinants and consequences of the quantity and the quality of local entrepreneurship, as well as the torrent of ripple effects that entrepreneurship has through local institutions and labor markets. To understand the geography of entrepreneurship, we need to elucidate at least three broad categories of explanations for the frequency and quality of entrepreneurship, including the interactions of those categories. First, we need to understand which individuals are more likely to start firms and how such individuals sort

Geography and Entrepreneurship

1363

themselves across the geography. This category includes analyses of personal characteristics of the entrepreneurs, such as their personality and demography, and their location choices. Second, we need to understand the role of supply- and demand-side factors, including how entrepreneurs benefit from spillover effects and local market potential. Third, we need to understand the role that formal and informal institutions play in fostering (or hindering) entrepreneurial behaviors and how such institutions are developed and maintained over time. For instance, simply because a region has good “economic fundamentals” and individuals that spot entrepreneurial opportunities, it does not imply that the individuals choose to act. New firms are started by people who act on perceived opportunities, and their decision to act or not is influenced by local institutions, e.g., in the form of social acceptance of entrepreneurship as well as the formal regulatory and policy environment. Regional science has much to say about all three categories of explanations. This chapter presents a selective summary of the literature on the geography of entrepreneurship. In Sect. 2 we first provide some arguments as to why the geography of entrepreneurship matters. Section 3 then reviews main supply- and demand-side factors that determine the local presence of entrepreneurship. Section 4 concludes.

2

Why Study the Geography of Entrepreneurship?

Entrepreneurs are widely acknowledged to be important “change agents” in the economy (Acs et al. 2009): it is the entrepreneur who finds and recognizes entrepreneurial opportunities and amasses enough resources and courage to act upon the opportunity. The entrepreneur will, as a by-product, spread knowledge of opportunities and of ways of working and lead by example to inspire others in their social surroundings. One perspective on the entrepreneur has its roots in the Schumpeterian school of thought and builds on Schumpeter’s notion of “creative destruction.” According to this view, entrepreneurs drive growth and economic dynamism by introducing innovations, new technology, and new ways of doing business. This perspective puts emphasis on “high-quality” and innovative new firms that disrupt established business models and bring new technologies and knowledge to the market. Another perspective on entrepreneurship is provided by Kirzner (1973) who emphasizes that entrepreneurship also has an equilibrating role in the economy. Examples of such entrepreneurship are new businesses of more mundane forms, such as coffee shops, restaurants, and electricians, which develop to satisfy local demand. While some scholars argue that genuine entrepreneurship should mainly be thought of as being of the former kind (Henrekson and Sanandaji (2014)), in the context of regional growth and development, there are arguments in favor of that both Schumpeterian and Kirznerian types of new businesses matter.

1364

M. Andersson and J. P. Larsson

Schumpeterian new firms may constitute an important function by creating jobs, induce structural change, and drive the creation of novelty in cities and regions. But more mundane forms of new business also matter at the local level for several reasons, even if the growth potential of individual firms may be limited. First, new firms in mundane activities, like regular restaurants, coffee shops, carpentry, and so on, fulfill an important function by developing a city’s or region’s variety of consumer and producer services that help in shaping the overall attractiveness of the place. Such business are, for example, vital in so-called placemaking. Second, new firm formation in mundane types of activities also implies an increase in the presence of local role models which develops knowledge and information about the practice of running a business. This may help to develop a local “culture of entrepreneurship” (cf. Fritsch and Wyrwich 2014; Andersson and Larsson 2016; Minniti 2005), and a higher density of local entrepreneurs may create pressure on local policymakers to develop regulations and enforcement procedures that are more business-friendly (Andersson and Henrekson 2015). Third, even in its most mundane form, entrepreneurship can lift at least one person out of dependence on social support or other people, namely, the entrepreneur himself. Finally, even business owners with the lowest of aspirations may run into unexpected opportunity to grow their business and/or draw on their experience and start another business in a new niche, even if by chance and not part of their initial plan. The empirical literature on the effect of new firm formation on regional growth and development typically finds significant positive effects of on long-run employment growth (see Fritsch 2013, for a survey). Many papers in this vein examine the influence of the overall rate of new firm formation, small business density, or selfemployment rates, and such measures are more likely to capture everyday entrepreneurship, rather than “radical” and high-potential start-ups (cf. Henrekson and Sanandaji 2014). This debate points to the wider role of new firm formation in regional growth and development, but also points to broadening our conception of “entrepreneurship.”

3

Entrepreneurs and Their Support Systems

This section proceeds by tracing the role of entrepreneurs and their interaction with the local milieu. Most arguments below are in principle valid whether we are talking about a high-impact entrepreneur in the Schumpeterian tradition or a marginal improver in the Kirznerian sense. However, it is easy to see that several of the arguments presented below will be more relevant for one or the other. For instance, an entrepreneur planning to grow their business into new markets would seem to benefit more from access to highly skilled labor compared to entrepreneurs with more mundane ambitions. The introduction made the case that “what causes entrepreneurship” can be understood along three dimensions in an economic-geographic analysis: (i) individual characteristics, (ii) market potential and access to inputs, and

Geography and Entrepreneurship

1365

(iii) formal and informal institutions. In this section we organize our discussion around these broad categories and close this section with a short note on spatial scale of the analysis.

3.1

Entrepreneurship: Individual Characteristics

Personal characteristics, acquired ones and inherited ones, are important for an individual’s decision to act entrepreneurially. Several personal characteristics and experiences correlate strongly with the decision to form a new venture (e.g., labor market experience, previous wage, age, employment status, and personality types/ traits) which means that the distribution across space of people with respect to those factors should be important for anyone who researches the economic geography of entrepreneurship. Other characteristics are ambiguous in terms of their influence on the average start-up decision but may importantly influence the quality of the entrepreneurial effort (e.g., human capital, previous occupation, and previous employer size). For instance, a robust observation in studies of entrepreneurship is that entrepreneurs with high measures of human capital run more impactful businesses. That is to say, we should expect regions to be high in entrepreneurial potential if they attract individuals with those characteristics or manage to foster those characteristics among their current inhabitants. In their regional analysis covering 97 percent of the world’s GDP, Gennaioli et al. (2013) conclude that the main factors explaining cross-regional divergence in regional development are human capital and entrepreneurial inputs and their interaction. Our knowledge of how spatial sorting of human capital impact the geography of entrepreneurship is also growing. There are, for example, recent studies of the spatial distribution of personality types/traits. Obschonka et al. (2013) employ large datasets from the USA, UK, and Germany and demonstrate substantial correlations between the spatial distribution of a Big Five personality profile deemed to be entrepreneurship-prone and the corresponding distribution of entrepreneurial activity.

3.2

Entrepreneurship: Demand Factors

The population of the local economy will reflect, first, the scale and, second, the diversity potential of local demand. The scale is important because it allows more production to be performed locally. Jane Jacobs’ (1969) main argument is built on how import is substituted away in larger places. Import substitution leads to division of labor and makes the city ever more productive in meeting local demand. New firms, particularly new large firms, may be stimulated by high local division of labor and may be motivated to invest in businesses with higher fixed costs and higher break-even points than they otherwise would.

1366

M. Andersson and J. P. Larsson

As specialization and size are intimately connected, so are size and diversity. The diversity of production of the local economy is directly limited by the extent of the market. Most large-scale services that are difficult to import may serve as good examples. If an opera house is in use by 1% of the population and needs 10,000 paying customers per year to sustain its business, then it needs a customer catchment area covering at least 1,000,000 people. In principle the argument is valid for any activity that requires substantial critical mass (e.g., elite sports teams) or that is enjoyed by a small part of the population (e.g., avant-gardist art). Similar argument can be made for the breadth of demand in areas like food and shopping, where the newest and most niched restaurants and stores are mostly patronized in and around the metropolitan landscape. In the literature on immigrant entrepreneurship, for instance, a common argument is that a local density of immigrants may create local demand for specific types of services (restaurants, grocery stores, medical services, etc.) that stimulate immigrant businesses (Aldrich et al. 1985). Local diversity of products and services also make it possible for dense regions to sustain an equally impressive range of lifestyles, as well as a swiftly expanding range of ways of life. Hence, potential entrepreneurs with knowledge about these lifestyles, combined with a willingness to put that knowledge to the commercial test, will become increasingly concentrated in such places where those lifestyles are present. Finally, it should be noted that the role of demand factors certainly depends on the nature of businesses. For example, demand-side factors are in general more important in explaining the geography of entrepreneurship in industries or activities that rely on local markets, such as various local service businesses (see, e.g., Andersson et al. 2019b).

3.3

Entrepreneurship: Supply Factors

A key supply-side factor that influences the geography of entrepreneurship is human capital. The skills of the local work force are a key input for new as well as established firms. The range and width of human capital, including the overall skills used in knowledge-intensive production, are both typically concentrated to large city regions, and this is one explanation for why large city regions are often described as “hotbeds” for entrepreneurship. Other than implying that access to cutting-edge knowledge is available locally, more highly skilled local people implies a more skilled pool of potential and current entrepreneurs. These and other factors described in this section contribute to a substantial modernity gap in the nature and type of entrepreneurship observed in different parts of the regional hierarchy (Andersson and Koster 2016). Aside from traditional supply factors, there are also supply-side factors that relate to the classic agglomeration economies. A key characteristic of the firm in the regional sciences is the idea that it can usurp various spillovers from its surroundings via local increasing returns that are external to the firm, but internal to places where other firms have already agglomerated. Duranton and Puga (2004) classify the

Geography and Entrepreneurship

1367

micro-foundations of such agglomeration economies into sharing, matching, and learning. Sharing refers to the ability of firms to share input suppliers or large investments in indivisibles. These benefits make investments in city walls, education institutions, and town squares viable, but also mean that firms in dense areas have a richer ecosystem of input suppliers, including specialized legal advice and other business services. Matching refers to that density of economic activity bodes for more efficient matching of skills to occupations, tastes to demands, and so on that necessarily results from thicker markets with consequently more people with specialized profiles and employers with specialized demands. This mechanism implies that cities are simply the place to be for whoever wants to produce (by running a firm or working in one) at the higher end of international value chains. Learning, finally, refers to conscious and unconscious knowledge and information spillover that are driven by nonmarket interactions. The tendency for the potential for such interactions to diminish particularly swiftly with space gives geography a unique role for facilitating this type of spillover, an idea to which we return below. This mechanism can further be expected to be more important the more knowledge-intensive is the area under study, as this type of spillover is more important when the knowledge that we are trying to piece together is incomplete or very new. Conceptually, it is natural to assume that these spillovers – whether characterized by sharing, matching, or learning – matter differently for Schumpeterian and Kirznerian entrepreneurship. For instance, it is difficult to see the availability of specialized legal advice as a deal breaker for a new taxi firm, whereas such presence may be a prerequisite for the scale-up or internationalization of a game developer. A general finding in the literature is indeed that the diversity of economic activity in larger city regions appears to matter more for start-up activity in high-tech and knowledge-intensive industries (see, e.g., Andersson et al. 2019b). To be sure, there are also negative effects of being in dense areas such as congestion and higher prices of land and real estate. When applying a population variable on the right-hand side of a regression, we consequently identify the net of all these effects.

3.4

Social but Real: Local Informal Institutions

The entrepreneurial potential of a region is not automatically realized simply because the fundamental factors are in place. A successful region still requires entrepreneurs who identify and act on opportunities: a process that requires knowledge and courage at the very least, both of which may be supplied via social networks. If a culture conducive to entrepreneurship takes root in a region, we may think of it as an informal local institution in the sense that it “governs” the economic fundamentals of the region.

1368

M. Andersson and J. P. Larsson

A burgeoning literature indicates that as places are characterized by different fundamentals, such as varying industrial structures, their social fabrics will start to diverge as well. Over time, regional support systems, legitimacy among the population, and other factors of the local milieu are adapted to be conducive to entrepreneurship in regions that are dense in entrepreneurs (Sorenson 2017). The social aspects of this process have interested researchers from different fields: entrepreneurial ecosystems, entrepreneurship cultures, and regional innovation systems. An early observer of divergences in local industriousness was Alfred Marshall (1920) who identified what he called industrial districts, within which knowledge is refined over time in a social process (IV.X.7): Good work is rightly appreciated, inventions and improvements in machinery, in processes and the general organization of the business have their merits promptly discussed: if one man starts a new idea, it is taken up by others and combined with suggestions of their own; and thus it becomes the source of further new ideas.

In Marshall’s analysis, the initial conditions were conceived of as mainly geographic. Natural preconditions had motivated some initial investment. Then skills and complementary skills are attracted, and the local economy is eventually sustained by agglomeration economies. When Chinitz (1961) analyzed differences in entrepreneurship between New York and Pittsburgh, he departed from observed differences in industrial geography. He suggested that different local industry structures would breed substantially different local entrepreneurs with varying frequencies. He suggested that entrepreneurs would be more effective in industries that were competitively organized, rather than oligopolistic, in nature. He juxtaposed the apparel industry that produced many entrepreneurs with the primary metals industry that produced few. This process was path-dependent and potentially persistent because of intergenerational transmission of entrepreneurial values and because of a culture of entrepreneurship, including positive attitudes to entrepreneurs in the local economy. For these reasons, Chinitz argued, diversified places like New York would have more entrepreneurs compared to manufacturing towns like Pittsburgh via feedback effects from the local industry structure, where dynamic effects emerged through transmission of skills and attitudes where many entrepreneurs where active. Following Chinitz’s (1961) analysis and claim that Pittsburgh’s dominance in steel was an important determinant of its subsequent lack of entrepreneurs, Glaeser et al. (2015) use distance to mining sites in 1900 as an exogenous instrument that is strongly correlated with modern-day average firm size. Other studies in this field have analyzed the persistence of regional start-up rates as such and concluded that persistence of entrepreneurship is strong at the regional level. This literature has generally taken as suggestive evidence a “culture” that things change slowly over time and that “the case for culture” is particularly strong in places that have been subject to much turbulence such as a war-torn and communistrun Germany over the twentieth century (Fritsch and Wyrwich 2014).

Geography and Entrepreneurship

1369

The learning mechanism described above has one crucial thing in common with network externalities: they occur regularly only among people who are located in close proximity to each other. This particularity of social networks implies that at least parts of any network peer effects in entrepreneurship should be observed at the neighborhood level, simply because neighbors are more likely to be in each other’s networks and more likely to rub off on each other’s behavior even in the absence of any formal connections between them. At the same time, it should be recognized that social network effects may also be confined to specific groups of people, even in a neighborhood. For example, there is plenty of evidence that immigrant entrepreneurs draw on resources through their social networks to other immigrants, often of the same ethnic group (Portes 1995). Co-location in small areas may thus not be sufficient for social network effects. In some cases, social network effects require a combination of low social distance and low physical distance. The rather large literature on ethnic enclaves and immigrant self-employment provides several examples research that builds on such ideas (see, e.g., Andersson et al. 2017; Toussaint-Comeau 2008).

3.5

Local Formal Institutions: Permits and Politics

As pointed out by North (North 1990), institutions (by which he meant any humanly devised constraint that shapes our conduct) interact with organizations (by which he meant political, economic, and social bodies). The interaction between local informal institutions and political organization remains an understudied area. Like-minded people are prone to sorting themselves into the same places, and places further exert conformity on its people. This conformism of local informal institutions is likely to find its way into formal institutions over longer time periods. There are several avenues that the effects on formal institutions could take, including most obviously voting (in ballots and with feet), but also through informal channels such as complaints and praise, informal network activity, and so on. By strengthening and amending local formal institutions a local government may in principle improve its position either by strengthening entrepreneurship within the local milieu or by attracting new entrepreneurs from other places. In order to understand these effects, first we need to investigate exactly which local institutions are determined at the local level. Most notably, local governments tend to have an important impact on regulations, whether in terms of how to devise them or how to enforce nationally imposed regulations. Such regulations include, for instance, building regulations and zoning laws (Andersson and Henrekson 2015).

3.6

The “Correct” Spatial Level of Observation

We should note at this point that the spatial scale of many effects outlined in this section is at this point still ambiguous. Historically, the analysis of labor market regions (and before that countries) where a necessity simply because “that’s where

1370

M. Andersson and J. P. Larsson

the data were.” As our data sources are getting increasingly flexible and even fully geocoded, researchers are no longer constrained by administrative units. When we are constrained by data aggregated at a high level, we risk being able to isolate only an outcome and infer the mechanisms conceptually. There are at least two ways that more disaggregated data sources may help us in this regard. First, high-resolution data improve our analysis of spillovers and network effects where these need to be inferred. With neighborhood-level data, we can keep higherlevel effects constant, but not the other way around other than imperfectly and inelegantly. For instance, if we are interested in assessing the relative importance of sharing, matching, and learning, then by keeping region-level characteristics constant, we can analyze the learning component in isolation under the assumption that sharing and matching occur at a higher level of regional hierarchy. Ideally this practice takes us closer to the mechanism human capital spillovers. Other than keeping other key factors constant, there is an empirical reason to analyze neighborhoods as well. The neighborhood level, properly conceived, often generates enough within variation, meaning that we do not have to restrict our analyses to people who move between regions. Second and relating to the last point, moving within a region is different from moving between regions. When performing research in the spatial sciences, a move is a wonderful thing, as it signifies an optimization decision. But by that logic, someone who moved within a region is at least content enough with the factors at the regional level not to skip town. Much of the migration literature has not simply focused on inter-regional moves but defined a move to be inter-regional in the first place. If we are interested in assessing entrepreneurs’ preferences for accessing resources such as urban amenities, for instance, then presumably we may analyze whether their within-region moves correspond to our predictions. This practice will suppress some of the endogeneity issues built into inter-regional migration. In the entrepreneurship literature, there has been a recent drive to follow parts of the literature on the attenuation of agglomeration externalities (e.g., Arzaghi and Henderson 2008; Andersson et al. 2019a) and to “open up” whole cities and regions in order to analyze entrepreneurship outcomes a fine spatial level. For example, in analysis of the geography of the quality of entrepreneurship in the USA, Guzman and Stern (2015) use novel data at a microgeographic level. They conclude that “our results highlight the micro-geography of the quality of entrepreneurship and suggest that clusters of entrepreneurial quality may benefit from being analyzed at a very low level of aggregation” and that “examining the factors that shape the boundaries of high-quality entrepreneurship is an important area for future research” (ibid. p,39).

4

Concluding Remarks

This chapter has discussed the role of entrepreneurship in economic geography. Regardless of anyone’s interpretation of direction of the effect, productive and particularly high-impact entrepreneurship characterizes successful regions and is absent in failing ones. Little regional development seems to take place without an

Geography and Entrepreneurship

1371

improved entrepreneurial base, and most if not all “places that don’t matter” (Rodríguez-Pose 2018) seem to have been accompanied by weak local entrepreneurship. It seems then that local and national policymakers have much in common with researchers in economic geography: in order to “get it,” they need to make sense of entrepreneurship. We draw four broad conclusions from our analysis: First, human capital is key. New firms are started by people. The geography of entrepreneurship is fundamentally influenced by how entrepreneurship-prone people sort across regions, by the opportunities that different regions provide when it comes to individuals’ accumulation of knowledge and experience relevant for entrepreneurship, as well as by the social and policy environment that may incentivize and support individual action. A successful region needs to ideally produce, and definitely access, skilled and entrepreneurial individuals that act on perceived opportunities. However, many questions remain about the how. Second, both the frequency and the quality of entrepreneurship matter. Most empirical researchers have focused their efforts on determinants and effects of increased frequency of new firm formation. Small businesses without growth ambitions are often entrepreneurial and important for regional development. But our knowledge of how regional factors influence the emergence and growth ambitions of so-called high-impact entrepreneurs remains an understudied area. Third, spatial scales can inform the analysis. If we want to get down to mechanisms in our analyses, then finely geocoded data sources may greatly help our efforts. Not only can flexible data help isolating effects that are only present in the microgeography; they can also generate spatial units that produce meaningful statistical variation given a shock (say, the entry of a handful of entrepreneurs) that would be considered noise in a regional measure. Fourth, entrepreneurship is a field characterized by endogeneity. Many causes and effects of entrepreneurship are subject to intense debate regarding direction of the effect. Connections between new firm formation and growth belonged here until quite recently, as do currently areas such as the connections between entrepreneurship and a local “culture,” as well as between entrepreneurship and enacted regional policies. Researchers that manage to tease out the direction of causality where our knowledge is missing will be in pool position in the future.

5

Cross-References

▶ Clusters, Local Districts, and Innovative Milieux ▶ Factor Mobility and Migration Models ▶ Knowledge Flows, Knowledge Externalities, and Regional Economic Development ▶ Knowledge Spillovers Informed by Network Theory and Social Network Analysis ▶ Local Social Capital and Regional Development ▶ The Geography of Innovation

1372

M. Andersson and J. P. Larsson

References Acs ZJ, Braunerhjelm P, Audretsch DB, Carlsson B (2009) The knowledge spillover theory of entrepreneurship. Small Bus Econ 32(1):15–30 Aldrich H, Cater J, Jones T, Mc Evoy D, Velleman P (1985) Ethnic residential concentration and the protected market hypothesis. Soc Forces 63(4):996–1009 Andersson M, Henrekson M (2015) Local competitiveness fostered through local institutions for entrepreneurship. In: Audretsch DB, Link AN, Walshok M (eds) Oxford handbook of local competitiveness. Oxford University Press, Oxford, pp 145–190 Andersson M, Koster S (2016) Are start-ups the same everywhere? The urban–rural skill gap in Swedish entrepreneurship. In: Geographies of entrepreneurship. Routledge, London, pp 140–160 Andersson M, Larsson JP (2016) Local entrepreneurship clusters in cities. J Econ Geogr 16 (1):39–66 Andersson M, Larsson JP, Öner Ö (2017) Ethnic enclaves and immigrant self-employment: a neighborhood analysis of enclave size and quality (No. 1195). IFN Working Paper Andersson M, Larsson JP, Wernberg J (2019a) The economic microgeography of diversity and specialization externalities–firm-level evidence from Swedish cities. Res Policy 48 (6):1385–1398 Andersson M, Lavesson N, Partridge MD (2019b) Local Rates of New Firm Formation: An Empirical Exploration using Swedish Data (No. 1290). Research Institute of Industrial Economics Arzaghi M, Henderson JV (2008) Networking off Madison avenue. Rev Econ Stud 75 (4):1011–1038 Chinitz B (1961) Contrasts in agglomeration: New York and Pittsburgh. Am Econ Rev 51 (2):279–289 Duranton G, Puga D (2004) Micro-foundations of urban agglomeration economies. In V. Henderson & J.-F. Thisse (Eds.), Handbook of Regional and Urban Economics (Vol. 4, pp. 2063–2017). Amsterdam: North Holland Fritsch M (2013) New business formation and regional development: a survey and assessment of the evidence. Found Trends Entrep 9(3):249–364 Fritsch M, Storey DJ (2014) Entrepreneurship in a regional context: historical roots, recent developments and future challenges. Reg Stud 48(6):939–954 Fritsch M, Wyrwich M (2014) The long persistence of regional levels of entrepreneurship: Germany, 1925–2005. Reg Stud 48(6):955–973 Gennaioli N, La Porta R, Lopez-de-Silanes F, Shleifer A (2013) Human capital and regional development. Q J Econ 128(1):105–164 Glaeser EL, Kerr SP, Kerr WR (2015) Entrepreneurship and urban growth: an empirical assessment with historical mines. Rev Econ Stat 97(2):498–520 Guzman J, Stern S (2015) Nowcasting and placecasting entrepreneurial quality and performance (no. w20954). National Bureau of Economic Research Henrekson M, Sanandaji T (2014) Small business activity does not measure entrepreneurship. Proc Natl Acad Sci 111(5):1760–1765 Jacobs J (1969) The economy of cities. Random House/Vintage Books, New York Kerr WR, Nanda R, Rhodes-Kropf M (2014) Entrepreneurship as experimentation. J Econ Perspect 28(3):25–48 Kirzner I (1973) Competition and entrepreneurship. The University of Chicago Press, London Lee YS (2017) Entrepreneurship, small businesses and economic growth in cities. J Econ Geogr 17 (2):311–343 Lindholm-Dahlstrand Å, Andersson M, Carlsson B (2019) Entrepreneurial experimentation: a key function in systems of innovation. Small Bus Econ 53(3):591–610 Marshall A (1920) Principles of economics, 8th edn. Macmillan, London Minniti M (2005) Entrepreneurship and network externalities. J Econ Behav Organ 57(1):1–27

Geography and Entrepreneurship

1373

North DC (1990) Institutions, institutional change and economic performance. Political economy of institutions and decisions. Cambridge University Press, New York Obschonka M, Schmitt-Rodermund E, Silbereisen RK, Gosling SD, Potter J (2013) The regional distribution and correlates of an entrepreneurship-prone personality profile in the United States, Germany, and the United Kingdom: a socioecological perspective. J Pers Soc Psychol 105 (1):104–122 Portes A (ed) (1995) The economic sociology of immigration: Essays on networks, ethnicity, and entrepreneurship. Russell Sage Foundation, New York Rodríguez-Pose A (2018) The revenge of the places that don’t matter (and what to do about it). Camb J Reg Econ Soc 11(1):189–209 Sorenson O (2017) Regional ecologies of entrepreneurship. J Econ Geogr 17(5):959–974 Toussaint-Comeau M (2008) Do ethnic enclaves and networks promote immigrant selfemployment? Econ Perspect 32(4):30–50 Wennekers S, Thurik R (1999) Linking entrepreneurship and economic growth. Small Bus Econ 13 (1):27–56

The New Economic Geography in the Context of Migration K. Bruce Newbold

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Migration in New Economic Geography Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Putting New Economic Geography Models into Practice: Examples . . . . . . . . . . . . . . 2.2 Does the New Economic Geography Realistically Portray Migration? . . . . . . . . . . . . 3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1376 1377 1379 1380 1383 1384 1384

Abstract

The new economic geography (NEG) framework offers a new perspective on regional dynamics and spatial-economic development by addressing the effects of inter-regional migration, trade and transport, the role of agglomeration advantages in large metropolitan areas, and product heterogeneity. Drawing on the existing literature, this chapter explores how migration is embedded within the new economic geography framework and draws upon examples to illustrate its role. Given migration is only conceived as individual responses to utility maximization by potential migrants, the chapter also discusses shortcomings of how NEG models treat migration. Keywords

NEG models · Migration · Utility · Regions · Economic geography

K. B. Newbold (*) School of Geography and Earth Sciences, McMaster University, Hamilton, ON, Canada e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_146

1375

1376

1

K. B. Newbold

Introduction

On average, people follow jobs (Partridge and Rickman 2003), with migration addressing skill shortages and reallocating workers and human capital across space. Migration is primarily responsible for changes in regional populations, with migration increasing or decreasing the population and the composition of a region. Reflecting neoclassical migration theory, worker mobility is a key element of labor market efficiency, with the economic benefits of migration including both societal (national) and individual perspectives. Indeed, labor mobility has long been recognized as a mechanism that allows labor markets to adjust to changes in the demand for labor, to better match workers with jobs, and to increase productivity and ultimately the standard of living. For the individual, the economic benefits to migration are largely defined by labor market considerations, increased income opportunities, and the consumption of public services and goods. In particular, workers with specialized skills will have an incentive to move because of potentially greater income gains from a better match between their skills and the job tasks required by firms. From a macroeconomic perspective, migration promotes the adjustment of regional (national) economies through the redistribution of population from high- to low-unemployment areas and from low- to high-productivity areas, increasing overall productivity. In the Canadian context, the Centre for the Study of Living Standards (CSLS) estimated in a 2007 report by Sharpe et al. that migration added nearly $2000 million to the Canadian economy in 2006, or a contribution of 0.137% of GDP to the economy through increased productivity, with the estimated increase attributed to gains from increased employment and increased productivity. Paul Krugman’s new economic geography (NEG) framework (Krugman 1991a, b) offers a perspective on regional dynamics and spatial-economic development by addressing the effects of inter-regional migration, trade, transportation, the role of agglomeration advantages, and product heterogeneity. As noted by Neary (2001, p. 536): The key contribution of the new economic geography is a framework in which standard building blocks of mainstream economics (especially rational decision making and simple general equilibrium models) are used to model the trade-off between dispersal and agglomeration, or centrifugal and centripetal forces.

The NEG model (also known as the core-periphery model) introduced by Paul Krugman in the early 1990s explains agglomerations over space. Although the model has since been expanded to include a larger number of interdependent regions, the original model itself was a relatively simple two-region model with agricultural and manufacturing production sectors. The manufacturing sector produces a range of horizontally differentiated products, with each variety of goods produced by a separate firm with scale economies using labor as the only input. The agricultural sector produces a homogeneous good under constant returns to scale, and farmers are the only input. The model also identifies two types of labor (farmers and workers). Key to the migration component embedded within the NEG

The New Economic Geography in the Context of Migration

1377

framework is the mobility of labor: While farmers are immobile and distributed equally between the regions, workers are free to move from one region to another. Because of scale economies and transport costs, it is more profitable to produce in the region with the larger market, leading firms to produce their good in just one region. If more firms are located in one region, workers in that region will have a greater variety of goods to choose from in comparison to workers in the other region. Therefore, workers in that region benefit from a higher real income, with the higher income inducing workers to migrate from the lower- to the higher-income region. As workers move between regions, centripetal force is generated through the movement of workers to be closer to producers (so-called forward linkages) and the incentive for producers to concentrate in the larger market (backward linkages). If these linkages are sufficiently strong to overcome the immobility of farmers (which creates a centrifugal force), the economy will ultimately end up in a core-periphery pattern with all production concentrated in one region. While NEG models have provided important insights into dispersal and agglomeration forces, what are the consequences of migration? While migration is an important part of the NEG framework, its role, along with the motivations for migration, has seemingly attracted much less attention from researchers. Drawing on existing literature, the purpose of this chapter is twofold. First, it explores how migration is embedded within the NEG model. Second, it discusses shortcomings of how NEG models treat migration. The intent of this chapter is not to evaluate and speak to the economic basis of the NEG framework, its extensions, or its applications – these can be found elsewhere in this volume, while multiple reviews of NEG models can be found elsewhere (see, e.g., Felbermayr et al. 2015; Fujita and Mori 2005; Ottaviano and Puga 1998). Instead, the following chapter focuses on how migration is treated within the NEG model. The balance of the chapter is organized as follows. First, the next section explores how NEG models treat migration. This discussion is followed by examples that emphasize migration, particularly those from the European Union (EU), that have used the NEG framework to look at the implications of EU expansion in the early 2000s. The chapter then explores the limitations of the NEG framework with respect to migration by asking whether the framework realistically portrays migration. With these limitations in mind, the concluding section offers potential avenues for further research and exploration of the framework.

2

Migration in New Economic Geography Models

Based on Krugman’s (1991a, b) new economic geography, regions are defined by immobile (farmers) and mobile (firms and workers (potential migrants)) factors of production, with workers responding to market potential and also affecting market potential (Kancs 2011). In this way, labor, wages, and markets are interdependent such that changes in one will cause adjustments in another part of the economy, with workers relocating to maximize their utility across space. In this regard, the movement of workers across space within NEG models echoes neoclassical migration

1378

K. B. Newbold

models, with workers moving to maximize their utility (typically represented by income in NEG models) and responding to “push” and “pull” factors, echoing much earlier work by pioneering migration researchers including Ravenstein, Lee, and Zelinsky. NEG models describe the process of spatial agglomeration with workers (and firms) relocating to regions where the market potential is higher. In motivating the migration decision, individuals will attempt to maximize their utility across all possible destinations, and individuals will migrate if their utility is greater in other regions. Individuals are faced with a set of discrete alternatives (regions) such that the decision to migrate to region i is made only if it maximizes individual utility relative to the current location and all other locations: Ui > Uj , for all j 6¼ i

That is, the utility of region i is greater than the utility of region j. In the NEG framework, utility is based on the expected real income and net of migration costs, a standard in such models. Migration can then be expressed as a probability based on these utility functions. In attempting to maximize utility, the choice probability of alternative i, P(Mi) is equal to the probability that the utility of choice i, Ui, is equal to or greater than all other utilities within the choice set C:   PðMi Þ ¼ Pr Ui > Uj ; I, jϵC and 0  PðMi Þ  1 This assumes that a “tie” (Ui = Uj) cannot occur. In general, the utility of an alternative is defined as the sum of the deterministic and random components of the total utility, such that Ui ¼ Vi þ Ei where Vi represents the deterministic component (expected real income) and Ei is the random (stochastic) component. The probability of migration can therefore be estimated by a logit model such as PðMi Þ ¼ exp ðVi Þ= exp Vj þ exp ðVi Þ



which ensures that ΣP(Mi) = 1 and 0  P(Mi)  1. The relocation of workers and firms creates two effects that are relatively specific to migrants and which create push and pull factors to migration. First, backward (demand) linkages influence the location choice of firms, with new firms raising the demand for labor in the destination, placing pressure on wages, and encouraging migration from the origin to destination as workers respond to higher wages. In turn, in-migration increases the demand for goods, which raises the profitability of firms, prompting additional firms to relocate. Second, forward linkages influence the location of workers. Given new firms lower the price index in the destination, the

The New Economic Geography in the Context of Migration

1379

declining cost of living raises real wages and utility, encouraging more workers to in-migrate because of the higher utility. In the long run, the resulting in-migration places downward pressure on ages, discouraging further migration. Ultimately, when relative net wages are equalized across regions, labor migration is stopped.

2.1

Putting New Economic Geography Models into Practice: Examples

NEG models have been applied in multiple cases, with models revealing evidence of a regional adjustment process in the USA (Blanchard and Katz 1992, noted in Ottaviano and Puga 1998) and the European Union. In the European Union case, multiple authors have used the NEG framework, particularly in light of concerns associated with migration between countries as the EU was enlarged (see, e.g., Brandsma et al. 2013; Crozet 2004; Kancs 2011; Pons et al. 2007). In the EU case, regional adjustment is less clear despite the existence of wage differences across countries (Decressin and Fatàs 1995, noted in Ottaviano and Puga 1998). While it has commonly been assumed that European integration would result in labor migration between and within member states, international migration within the EU is, in fact, comparatively small (Ottaviano and Puga 1998), and it has long been noted that there is a significant discrepancy between the predicted number of migrants and observed migration flows (Kancs 2011), with models predicting much higher levels of inter-country migration than observed. Most of the early studies were based on reduced form (RF) models, with variables such as wages determined a priori and fixed exogenously. The reduced form approach also ignores general equilibrium feedback effects caused by migration between regions. In such cases, smaller economies may be more sensitive to international shocks, so RF models are not suitable for predicting migration (Kancs 2011). Indeed, these reduced form models predicted that 10–15% of Central and Eastern Europe’s population would move to Western Europe with the expansion of the EU. Given the expected mass migrations from east to west with the enlargement of the EU in 2004, policies were put in place to restrict east-west migration, protecting older EU markets from the expected high in-migration. These transitional measures were slowly relaxed, with most of the restrictions on movement removed by 2011. However, the expected mass movement of workers failed to materialize, with Kancs reporting that less than 1% of the workforce relocated in the period after EU enlargement. Was this because of the restrictions on migration or was it for some other reason? In an attempt to address the discrepancy between observed and predicted migration flows, modelers turned to general equilibrium frameworks based on the NEG framework. Kancs (2011), for example, looked at international migration within the enlarged European Union, with model results suggesting that expansion of the EU and liberalization of migration flows would result in only an additional 1.8–3.0% of the total EU workforce migrating between countries and with predicted migration numbers more consistent with observed flows. The study results suggested that European migration rates were low despite differences in wages across the region. Further, while net migration was increasing, it was doing so at a

1380

K. B. Newbold

decreasing rate. Kancs’ work echoes work by Brandsma et al. (2013), with their model results (also based on the NEG framework) illustrating that migration played a secondary role in regional adjustment. Kancs’ work had two important policy implications. First, results suggested that migration within the EU was self-regulating. As such (and second), restrictions on movement between EU countries were not needed. That is, migration appeared to behave as expected by the NEG framework. In fact, migration restrictions imposed by the EU may have dampened economic growth in the period immediately after enlargement given that restrictions on the movement of labor may have distorted the equilibrium allocation of labor across Europe, reducing economic growth and welfare (Kancs 2011).

2.2

Does the New Economic Geography Realistically Portray Migration?

Despite its comparative simplicity in considering migration, Neary (2001) highlights the limitations of the NEG model, including how it treats transportation costs, space, and firm strategies, including completely footloose firms. But, models also simplify the real world in order to illustrate possibilities. Although NEG models provide new ways of looking at migration and agglomeration economies, do they provide new insights into the drivers of migration, or do they adequately account for the multiple factors that ultimately drive migration decisions? The simple answer would be “no” given their reliance on real wages as the motivating factors for individual migration. Within NEG models, individuals are assumed to focus only on current utility levels that are defined by wages, an assumption which is both restrictive and unrealistic as migration decisions are frequently made on the grounds of discounted future utility and the social and actual costs of migration, along with a broad array of other factors that can motivate (or diminish) the prospects of movement across space. In the Canadian context, Coulombe (2006) found that Canadians did not seem responsive to short-run, (i.e., business cycle) asymmetric shocks to the economy. It may be that workers are content to “ride out” economic downturns and wait for the economy to improve rather than experiencing the disruption of relocation. Alternatively, workers may engage in long-distance commuting, with individuals living in one province (region) and working in another, with movement back and forth over a period of time. Instead, the decision to make an inter-provincial migration appeared to be largely due to longer-term structural differences across Canadian provinces, including better job opportunities (unemployment driven), differences in productivity (migration from low- to high-productivity provinces, reflecting agglomeration effects rather than response to welfare differentials), and migration from rural to urban areas (Coulombe 2006). So where does this leave us? Casting back some 50 years or so, neoclassical economics argued that labor migrates in response to inter-regional wage differentials moving from low- to high-wage regions, echoing NEG approaches. With interregional movement, labor supply will decrease in low-wage areas due to

The New Economic Geography in the Context of Migration

1381

out-migration, forcing wages to rise. On the other hand, increasing labor supply in high-wage regions will force wage rates to be lowered until wage rates equilibrate across space. Empirical results from countless models and situations have consistently confirmed that wages do indeed exert a significant and positive effect upon migration, with individuals more likely to choose destinations with higher wage rates. Despite its appeal, however, the macro-adjustment model has been subject to a number of criticisms, most of which can also be directed toward the NEG framework. Foremost among these is the assumption that labor will move from low- to high-wage regions, allowing wage levels to equalize across the system. This assumes, of course, that there are no barriers to migration between regions. However, perfect mobility is rare given market imperfections such as regionally variable worker recognition and accreditation requirements, social transfer programs including unemployment insurance that may “retain” (or, at a minimum, delay the need to migrate), incomplete information on the part of potential migrants (i.e., not knowing all possible alternatives), and “stickiness” in the labor and wage market (i.e., associated with labor unions or minimum wage requirements), all of which serve to complicate or impede the free movement of individuals. Second, while wages are undoubtedly important in motivating migration, it is unclear whether regional wage levels move toward equilibrium through migration. Persistent regional income disparities in highly mobile countries such as Canada or the USA, for example, suggest that the consequences of inter-regional migration have little to do with the regional equalization rates prescribed in the macroadjustment or NEG models. Moreover, the choice of the appropriate representation of wages within theoretical and operational models has also been debated. For example, it is likely that only wage earners will respond to income differentials. As such, wages will be less important in explaining migration behavior among those no longer in the labor force, although, admittedly, NEG models have tended to focus on individuals in the labor force. Third, migration, particularly among those in the labor force, is often associated with movement due to unemployment and for employment opportunities or increased wages. However, not everyone is equally likely to migrate. Despite the economic incentives of improved income and labor market outcomes, individuals are often reluctant to pack up and move across the country for a multiplicity of reasons. The existence of other variables and personal factors, which have been observed to have significant effects upon the migration decision, suggests that both the macro-adjustment and NEG models are too simplistic in their reliance upon wages as the motivator of migration. By way of an example, an important variable missing from the macro-adjustment model is unemployment, a problem underscored by experiences during the depression of the 1930s. During this time, positive net migration to rural areas was observed despite the fact that wage rates in urban areas remained considerably higher than those in rural areas, a situation that the wagedifferential approach could not explain. The population movements during this time were, however, due to the severe unemployment in urban areas, suggesting the effect of unemployment upon migration decisions. When applied to the current migration system, higher unemployment in a region should generate higher levels of

1382

K. B. Newbold

out-migration, while in-migration should be negatively related to unemployment levels. Still, measures of unemployment in origin and destination regions are often inconsistent in terms of their implied effect and level of significance. This inconsistency reflects, in part, the nature of aggregate data and simultaneity bias inherent within models where the decision to migrate and destination choice are modeled within the equation. However, support for NEG and agglomeration has been found in other studies (i.e., Cushing 1999; Morrill and Falit-Baiamonte 1999) which observed migration leading to increased social and economic polarization, more reflective of a process of cumulative causation. As Coulombe notes (2006, p. 219): [with] the exception of the 18-24 age group, migration at the individual level is a structural adjustment since it carries with it very high costs: selling residential assets, relocating the children’s education, and coordinating jobs for a married couple.

Similarly, NEG models do not fully account for distance and other personal factors including marital status, gender, occupation, and age that are all significantly associated with migration propensities. Increasing age, for example, is associated with lower migration propensities which reflect the decreased time horizon over which the costs of migration can be recouped. Distance too has an effect on migration decisions. In addition to the well-known impact of distance decay, the movement of workers across international borders (compared to inter-regional movement) irrespective of the actual distance moved may be more limited given controls on international migration. Of course, this is less of an issue within the EU given the Schengen Agreement that enables workers to move seamlessly across countries, although language ability will certainly play a role. Despite NEG-based models having been applied in the international context (i.e., Kancs 2011), Ottaviano and Puga (1998, p. 715) note: barriers to international migration make worker mobility a more suitable explanation for agglomeration in a regional than in an international context.

Still, distance is a proxy for other costs of migration, including the cost of acquiring information, biases, or lack of adequate information on alternate possibilities, all of which may limit an individual’s search field, especially among population subgroups such as the less educated, meaning that the likelihood of migration is reduced as distance increases. Conversely, labor market options for the better educated should be wider given lower information costs and greater ability to collect and interpret information on potential destinations. Beyond the financial costs proxied by distance, there are significant nonpecuniary costs, including social costs due to the disruption of social networks and schooling, and reduced contact with family and friends, meaning that decisions about migration are not necessarily economically rationale (Hanson 2014). Fourth, while feedback opportunities exist within the models, they would not capture changing migration rates over time. Declining rates of migration in the USA and Canada – a phenomenon observed since the 1970s in both countries, for example – highlight shifts in societal and personal preferences around migration options. In

The New Economic Geography in the Context of Migration

1383

part, declining migration rates can be attributed to population aging, with an older population less likely to migrate. Declining migration rates have also been attributed to increased labor force participation rates among women and dual-earner households. Instead, rather than investing in migration which results in a permanent relocation, long-distance commuting (including commuting between regions and the phenomena of inter-regional employment and short-term relocation for work where a worker lives in one region but works in another) may occur more frequently. Given that individual workers do not permanently relocate, such long-distance commuting may slow tendencies toward agglomeration. However, NEG models could explore the ability of firms to relocate across space and how such worker participation impacts agglomeration tendencies. Finally, NEG models speak only to the movement of workers across space and their equilibrating actions. This, of course, misses the movement of older adults (nonworkers) and amenity or health-related migrations that occur later in the life course (and/or in response to the life course), with movement also having an impact on the demand for goods in the origin and destination regions. Further, migrations among older adults are often counter to labor force flows, including movements down the urban hierarchy and into smaller urban or rural areas (which, incidentally, leads to a related discussion of the variation in population density within a region and its impact on migration decisions). With aging populations in most developed countries, the flow of older adults will become increasingly important and impact regional economic development through demand-side linkages while having little impact on the supply of labor.

3

Conclusion

Despite the fact that NEG-inspired models do not adequately represent the multitude of motivations for migration, they are not meant to do so. Instead, they capture what might best is described as the basic motivator for migration across regions among individuals in the labor force – increased utility and income. Regardless, and given some of the limitations of the NEG models as discussed above, NEG models may ultimately be less suitable for addressing aspects of migration or understanding the broader implications of inter-regional migration. While multiple authors have explored factors related to economic agglomeration by manipulating assumptions around factors such as competition, industrial specialization, and trade costs (see, e.g., Camacho 2013; Ottaviano and Puga 1998, for more fulsome discussions) within existing NEG models, there has been relatively less attention paid to the role of migration. In particular, there is a need to expand the discussion away from a focus on real wages as the driver of inter-regional migration to encompass other benefits (and costs) of migration within the utility function. In addition, emphasis on educational profiles, origin regions, the selectivity of migration flows, and the role of migrations made by older (non-working) adults may be beneficial. The challenge, of course, is to build this information in a tractable general equilibrium model.

1384

4

K. B. Newbold

Cross-References

▶ Factor Mobility and Migration Models ▶ Migration and Labor Market Opportunities ▶ New Economic Geography After 30 Years ▶ Spatial Equilibrium in Labor Markets

References Blanchard OJ, Katz LF (1992) Regional evolutions. Brooking Papers on Economic Activity 1:1–75 Brandsma A, Kancs d’A, Persyn D (2013) Modelling migration and regional labour markets: an application of the new economic geography model RHOMOLO. Vives discussion paper. Katholieke Universiteit Leuven, Leuven, p 36 Camacho C (2013) Migration modelling in the new economic geography. Math Soc Sci 66:233–244 Coulombe S (2006) Internal migration, asymmetric shocks, and interprovincial economic adjustments in Canada. Int Reg Sci Rev 29(2):199–223 Crozet M (2004) Do migrants follow market potentials? An estimation of a new economic geography model. J Econ Geogr Oxford University Press (OUP): Policy F, 2004, 4 (4):439–458. halshs-00096277 Cushing B (1999) Migration and persistent poverty in rural America. In: Pandit K, Davies-Whithers S (eds) Migration and restructuring in the United States: a geographic perspective. Rowman and Littlefield, Lanham Decressin J, Fatàs A (1995) Regional labour market dynamics in europe. European Economic Review 39:1627–55 Felbermayr G, Grossmann V, Kohler W (2015) Migration, international trade, and capital formation: cause or effect? In: Chiswick BR, Miller PW (eds) Handbook of the economics of international migration. Elsevier, North Holland, pp 915–1025 Fujita M, Mori T (2005) Frontiers of the new economic geography. Pap Reg Sci 84(3):377–405 Kancs d’A (2011) Labour migration in the enlarged EU: a new economic geography approach. J Econ Policy Reform 14(2):171–188. https://doi.org/10.1080/17487870.2011.577648 Krugman PR (1991a) Increasing returns and economic geography. J Polit Econ 99:483–499 Krugman P (1991b) Geography and trade. MIT Press, Cambridge Morrill R, Falit-Baiamonte A (1999) Social and economic change and intermetropolitan migration. In: Pandit K, Davies-Whithers S (eds) Migration and restructuring in the United States: a geographic perspective. Rowman and Littlefield, Lanham Neary JP (2001) Of hype and hyperbolas: introducing the new economic geography. J Econ Lit 39:536–561 Ottaviano GIP, Puga D (1998) Agglomeration in the global economy: a survey of the ‘new economic geography’. World Econ 21:707–731. https://doi.org/10.1111/1467-9701.00160 Partridge MD, Rickman DS (2003) The waxing and waning of U.S. regional economies: the chicken–egg of jobs versus people. J Urban Econ 53:76–97 Pons J, Paluzie E, Silvestre J, Tirado DA (2007) Testing the new economic geography: migrations and industrial agglomerations in Spain. J Reg Sci 47(2):289–313 Sharpe A, Arsenault JF, Ershov D (2007) The impact of interprovincial migration on aggregate output and labour productivity in Canada, 1987–2006. CSLS Res Rep:2007–2002

Further Reading Poot J, Waldorf B, van Wissen L (2009) Migration and human capital. Cheltenham, Edward Elgar Nijkamp P, Poot J, Sahin M (2012) Migration impact assessment. Cheltenham, Edward Elgar

Environmental Conditions and New Geography: The Case of Climate Change Impact on Agriculture and Transportation Sandy Dall’erba and Zhenhua Chen

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Impact on Agriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Current State of the Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Challenges Ahead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Impact on the Transportation System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Climate Change Impact and Resilience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1386 1387 1387 1388 1390 1390 1392 1393 1394 1394

Abstract

While climate change and associated extreme weather events are expected to affect all the sectors of the economy, agriculture and transportation will be especially impacted. The current agricultural production mix reflects current local factor endowments in terms of soil characteristics, climate conditions, and water access as trade theory would suggest; yet it is very likely that future climate conditions will disrupt these local competitive advantages and lead to new production choices and trade patterns. When it comes to transportation services, disruption caused by extreme weather events has become more frequent in recent years. It has a direct influence on airport capacity and flight route, which can cause cascading delays throughout the entire aviation system. Because weather is S. Dall’erba (*) Department of Agricultural and Consumer Economics and Regional Economics Applications Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL, USA e-mail: [email protected] Z. Chen Department of City and Regional Planning, The Ohio State University, Columbus, OH, USA e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_147

1385

1386

S. Dall’erba and Z. Chen

considered the primary factor for airline delays both in the USA and in China, the transportation sector is particularly keen on adapting its activities to new weather conditions. As such, this chapter provides a review of the current state of the literature and lists the paths for future research on this topic. Keywords

Climate change · Agriculture · Transportation

1

Introduction

As the global environment gets warmer due to an increase in greenhouse gas emissions, the number of extreme weather events has increased substantially compared to decades ago (Intergovernmental Panel on Climate Change 2014). Climate change leads to various negative consequences, such as the generation of more extreme precipitation events, the alteration of mid-latitude storm tracks, changes in the frequency and intensity of hurricanes, and other types of disruptions due to prevailing weather patterns. Naturally, agriculture, an economic sector that is very sensitive to weather conditions, will be greatly affected by such changes regardless of geographical region (Schlenker et al. 2005, 2006; Deschênes and Greenstone 2007). Yet, some regions within large agricultural producers (e.g., USA, China, European Union) will fare better than others. However, the overall net gain or net loss to the agricultural sector of these countries is still unclear (Deschênes and Greenstone 2007). Indeed, while the current agricultural production mix reflects current local factor endowments in terms of soil characteristics, climate conditions, and water access as trade theory would suggest, it is very likely that future climate conditions will disrupt these local competitive advantages and lead to new production choices and trade patterns. Moreover, for some countries, this dramatic shift will increase food insecurity. Another important sector that is expected to experience major changes due to the increasing occurrence of extreme weather events is transportation. Disruptions of transportation services caused by extreme weather events have also become more frequent in recent years. In fact, weather is considered the primary factor for airline delays both in the USA and in China. Severe weather conditions have a direct influence on airport capacity and flight route, which can cause cascading delays throughout the entire aviation system. In addition, other modes of transportation, such as roadway, railway, seaport, and public transit also face the increasing challenges of various types of severe events. A national survey of the public transit agencies in the USA reveals that extreme snow storms and flooding have caused the most severe impact on transit agencies. These events may cause various consequences, such as heavy delays of transit service, temporary shutdowns, and damage to vehicles or equipment. In sum, transportation agencies were found to be often unprepared to deal with the powerful and cascading effects of extreme weather conditions, which could lead to a severe disruption or a completed shutdown of transportation infrastructure and operations.

Environmental Conditions and New Geography: The Case of Climate Change. . .

1387

Other aspects of climate change include the effect on the labor market, interregional and international migration, nutrition and health, air pollution, and water management (Bae and Dall’erba 2018). Effects on other natural systems such as wildlife and plant abundance, distribution, and biology also have been well documented, but they will not be addressed in this chapter due to space limitations.

2

Impact on Agriculture

2.1

Current State of the Literature

The attention generated by the impact of the Summer 2012 drought on the Corn Belt exemplifies how vital it is for the US agricultural sector to understand climate change and how to mitigate and/or adapt to it. While there is little controversy about whether agriculture is sensitive to changes in climatic conditions, significant uncertainty exists with respect to future climate’s impact on agriculture. It is anticipated that some regions will be winners and others losers, but it is still unclear whether climate change will bring a net gain or a net loss for the US agriculture as a whole (Mendelsohn et al. 1994; Deschênes and Greenstone 2007), as production currently spans a variety of climate zones over all the lower 48 states and occupies up to 42% of the US territory. Among the studies of climate impacts on US agriculture, three types of models can be identified. The first is the crop growth simulation model. Based on agronomic (biophysical) models, their strength lies in their capacity to simulate crop growth over the life cycle of a plant exposed to the full range of weather outcomes, including extreme events. They also simulate how changes in climate modify a crop’s input requirements such as fertilizers and irrigation. However, this approach has been criticized for the large number of calibrated parameters it relies on, its lack of validity on a global basis, and for not accounting for the farmers’ capacity to adapt their crop choice to climate. It should be noted that the Intergovernmental Panel on Climate Change (IPCC) uses the crop model approach in conjunction with the Basic Linked System (BLS) of National Agricultural Policy models, a world-level equilibrium model system, to estimate the impact of climate change on food production. This approach is of the highest complexity and requires coordination among several groups with large computational capacity. The second type of model uses regression analysis to estimate the impact of climate and other exogenous inputs (such as soil quality) on one type of crop. This approach requires significantly less data and computational resources than the simulated crop growth models, but does not consider the ability of farmers to adapt their choice of inputs and crops to changing climate conditions. This framework is particularly relevant in developing countries where it is difficult to assume that farmers have the private capital or government support necessary to adapt their farming practices. However, when it comes to the USA, empirical evidence clearly demonstrates that adaptation at the farm level is already taking place. The 1996 report of Schimmelpfennig et al. indicates that “some of the alternatives considered are adoption of later maturing cultivars, change of crop mix, and a timing shift of field

1388

S. Dall’erba and Z. Chen

operations to take advantage of longer growing seasons.” Furthermore, adaptation does not limit itself to crop producers. Schimmelpfennig et al. (1996) also report that “the growth of dairy in the South is a testament to the creativity of farmers in finding ways to cool animals in hot climates (for example, shading, wetting, circulating air, and air conditioning). Other adaptations include herd reduction in dry years, shifting to heat-resistant breeds, and replacing cattle with sheep.” Another reason that makes it difficult to justify a crop-production approach in the USA is that the regression framework used in this literature treats each crop individually. By definition, it does not account for the fact that crops are mutually dependent through factors such as crop rotation (practiced for 85% of the corn and 75% of the wheat of the USA over 1990–1997) or access to inputs (land, water, fertilizers, and government subsidies) whether they share the latter or compete for them. Finally, we believe that the prospect of an uncertain climate drives the desire to make public and private investment decisions that mitigate the impact of climate and increase the chance of adaptation. For instance, mitigation policies, such as cap and trade on CO2 emissions linked with agricultural offsets, are already under consideration in the USA. Based on these observations, a third theoretical framework called the Ricardian approach was initiated by Mendelsohn et al. in 1994. Their regression model approach relies on the assumption that landowners, well aware of their local production conditions, allocate their land to the most rewarding use. It is the opposite of what Polsky (2004) calls the “naïve farmer” who would blindly continue to plant the same crop even though new climatic conditions would be more propitious to another more profitable crop. This framework has attracted much attention when analyzing the US agricultural sector (e.g., Schlenker et al. 2005, 2006; Deschênes and Greenstone 2007), partly because decisions on which crop to plant, how much of each input (fertilizers, herbicides, etc.) to use, and what tillage/management technique to adopt are determined endogenously and will reflect in the value of farmland and/or agricultural profits, the usual dependent variables in a Ricardian regression framework. Based on this idea, the literature is not unanimous about the impact of climate change on US agriculture, indicating the need for further developments in the conceptual framework and estimation strategies used in this field. Indeed, compared to the original work of Mendelsohn et al. (1994) who find an annual gain in agricultural production of $1–$2 billion, Schlenker et al. (2005) control for the role of irrigation and suggest a loss of about $5–$5.3 billion per year. Based on a different definition of dryland counties and different climate change scenarios, Schlenker et al. (2006) find an annual loss of $3.1–$7.2 billion a year. In contrast, Deschênes and Greenstone (2007) report a $1.3 billion profit annually consistent with Mendelsohn et al. (1994). There are other Ricardian studies that focus on North America (see, for instance, Polsky 2004; Dall’erba and Dominguez 2016), but they rely on only a part of the entire sample of US counties.

2.2

Challenges Ahead

This section focuses on two methodological aspects that deserve further attention in climate change impact studies.

Environmental Conditions and New Geography: The Case of Climate Change. . .

1389

Interregional spillovers: While econometric analyses of climate change impact have undergone remarkable developments over the last decade, significant improvements are still needed to model the presence and magnitude of interregional spillovers and assess the extent to which an outcome in one area is driven not only by local weather/climate changes but also by changes in neighboring locations (Dall’erba and Dominguez 2016) or in trade partners (Chen and Dall’erba 2019). Consideration of spillover effects is important because it provides policy-makers with an enhanced understanding of the structure of climate change impact. Drawing attention to the total impacts from a mitigation or adaptation strategy might generate benefits in surrounding regions and vice versa. However, the spillover effects between two regions are rarely symmetric, which makes it difficult to identify spatially targeted adaptation/mitigation policies. In addition, various channels of transmission (e.g., CO2 emissions transmitted through trade and atmospheric channels) between any two neighboring regions can either augment or offset exchange imbalances. With the only exception of Polsky (2004) and Dall’erba and Dominguez (2016), the current literature on the USA treats spatial autocorrelation through a spatial error autocorrelation which assumes that only the unobserved factors are spatially correlated (as in Schlenker et al. 2005, for instance) or through Conley’s (1999) nonparametric method (as in Deschênes and Greenstone 2007). These two approaches do not account for the potential presence of spatial dependence among the regressors, i.e., for spatial spillovers. For instance, rain in one locality can provide water for irrigation elsewhere and affect agricultural production in another location. Similarly, it is acceptable to assume that all agricultural goods produced in one location will not be consumed locally. Some will be sold elsewhere; hence consumption behaviors far away from the place of production still matter (for instance, 70% of Arizonaproduced agricultural goods are for the rest of the USA or for export, Bae and Dall’erba 2018). These two examples of spatial spillovers take place across observations commonly included in a Ricardian model (rain and consumption); hence they should be modeled accordingly. If not, the model suffers from a missing variable bias that affects both the coefficient estimates and the accuracy of the standard errors as well known from the spatial econometric literature (Le Sage and Pace 2009) and from the hedonic literature the Ricardian approach is based on. Polsky (2004) calculates the marginal effects of their model with spatial lag as β, as in a nonspatial framework, while they should be measured as (In + ρW + ρ2W2 + . . . + ρnWn)β. As such, one needs to rely on the decomposition method brought to the fore by Le Sage and Pace (2009) to separate the marginal direct effect from the marginal indirect effects (i.e., spillovers) and provide appropriate conclusions. Only Dall’erba and Dominguez (2016) have used this approach in a Ricardian setting so far. Spatial heterogeneity: Traditionally, if heterogeneity is not accounted for in an econometric model when it should, it leads to inconsistent estimates. The current Ricardian literature offers a pretty straightforward approach to deal with spatial heterogeneity. It consists in defining a minimum of two groups, calculating the marginal effects associated to each group, and running a Chow test to verify their dissimilarity. For instance, Schlenker et al. (2005) uses a cutoff level equal to 10% of

1390

S. Dall’erba and Z. Chen

the harvested cropland under irrigation to delineate the dryland versus the irrigated regions while Deschênes and Greenstone (2007) use 20%. The latter paper offers also a spatial heterogeneity based on state boundaries. More recently, Dall’erba and Dominguez (2016) focus on differences in elevation (across the south-western US counties only). Needless to say, the Chow test results reveal in all of them that the structural break is significant. However, some alternative ways, possibly closer to the actual data generating process, of modeling heterogeneity deserve to be explored further in the climate change literature. The first such option is the hierarchical model that accounts for the fact that the outcome variable depends on variables organized in a nested hierarchy. Indeed, the “lower level” (county-level) data are actually nested within “higherlevel” data. Two types of higher-level groupings could be used. The first one refers to the location and size of the watershed each county belongs to, and the second one deals with the location and size of the climate region each county belongs to. There are 18 watersheds and 9 climate regions in continental USA as identified by USGS (United States Geological Survey) and NOAA (National Oceanic and Atmospheric Administration) respectively. From an econometric point of view, within- and between-group interactions need to be accounted for; otherwise it would lead to inefficient estimates. The second option models and measures spatial heterogeneity by means of geographically weighted regressions (GWR) or its most advanced version, the multiscale-GWR (Fotheringham et al. 2017), so that it allows us to provide climate impact estimates for each individual county. Applying this approach to the Ricardian framework has never been done, yet it is noteworthy because it shows how adaptation takes place at different speeds and to different extents across the USA. Speed and extent are determined by a set of local climatic and economic conditions, but also by the type of agricultural output(s) that dominate local production, as in the crop production literature. As a result, the combination of these two theoretical frameworks would allow us to compare each county’s yield to the sample of its local neighbors and gives us more insights into the type of adaptation strategy that is relevant to each locality.

3

Impact on the Transportation System

3.1

Literature Review

A large number of studies have attempted to examine the impact of extreme weather conditions on transportation performance from various perspectives, including engineering, economics, and geography. For instance, engineers often focus on assessing the vulnerability of the transportation system with an emphasis on understanding the extent to which the performance of a transportation infrastructure is affected by the change of climate. On the other hand, in the field of economics, the impact of extreme weather condition on transportation focuses primarily on the evaluation of both direct and indirect economic losses and relies on either econometric analysis or

Environmental Conditions and New Geography: The Case of Climate Change. . .

1391

computable general equilibrium models. Finally, for geographers, the impact of environment conditions on transportation is foremost evaluated by calculating either the change in spatial accessibility due to the disruption of weather-related events or the change of transportation performance. One should note that while most studies attempt to provide an in-depth assessment of the impact of climate change on transportation, most of these assessments focus on a single mode of transportation. Studies evaluating the impact of extreme weather events among different modes of transportation are even scarcer. One of the reasons for such a limitation is the lack of an interdisciplinary approach that would take both the climate and socioeconomic scenarios into account. Among the exceptions, Taylor and Philp (2010) provide a qualitative assessment of the impact of climate change on land-based transportation infrastructure, systems, and behavior in Australia. Although their work emphasizes that the key for adaptation in the transportation sector is to understand system vulnerability and develop resilience, it remains unclear how these issues can be assessed quantitatively. Koetse and Rietveld (2009) provide a comprehensive review of the effects of climate change and weather conditions on the transportation sector. Specifically, they indicate that assessing the influence of climate change on a transportation system can be conducted using three approaches. The first one consists in comparing the transportation systems between regions with very different climate conditions; the second one proposes to examine the impact of climate change through an understanding of seasonal variations in transportation service; and the third one examines the relationship between weather and travel behavior. Another study worth nothing is Stamos et al. (2015) who evaluate the transportation-related adaptation measures and policy actions for addressing climate change through a case-study review. Although the varying strategies adopted by different modes of transportation were summarized, the study did not provide an in-depth comparison of how different transportation modes react to the occurrence of specific extreme weather events. The impact of severe weather events on transportation systems has been investigated extensively from the demand perspective. For instance, Tsapakis et al. (2013) examine the influence of severe weather events such as heavy rain and snow on urban travel time changes. Their work focuses on the greater London area. Their study finds that severe weather has a strong influence on urban transportation demand due to travel time delay. Similarly, Lu et al. (2014) evaluate how people adapt their inter-city travel behavior during severe weather events in Bangladesh using a flood event as an example. They find that travel demand is likely to decrease significantly as roads are disrupted by flooding. Other studies evaluate the impact of extreme weather events on transportation through various indicators. Although the indicator approach provides transportation agencies with a straightforward method to understand the risk of transportation assets to various extreme weather events, it has a major limitation in that the result could be arbitrarily determined as the criteria of measurement are often derived from experts’ judgments or experiences. In addition, there is also a lack of systematic consideration of the statistical association of various events on the performance of transportation systems. One of the exceptions is Chen and Wang (2019), in which

1392

S. Dall’erba and Z. Chen

various types of environmental conditions, such as precipitation, fog, thunderstorm, snow, wind speed, and air pressure, are considered. Using both GIS and regression analysis, the study analyzes the extent to which the on-time performance of different transportation systems is affected by these various environmental conditions. Overall, high-speed rail is found to be less vulnerable than aviation to most severe weather conditions.

3.2

Climate Change Impact and Resilience

The literature on resilience assessment of transportation systems in the context of climate change has been burgeoning over the recent years. In fact, the notion of resilience has evolved from the Latin word “resilio” which originally refers to the mechanical concept of rebound (Rose 2004). Given the increasing concern regarding hazard mitigation, disaster preparation, and recovery, the term has been widely adopted in various disciplines such as engineering, economics, and geography. In general, engineers tend to evaluate the resilience of transportation with a focus on mitigation from a predisaster perspective, whereas scholars in the field of disaster economics tend to measure resilience based on postdisaster recovery. For the geographers, the emphasis is on both mitigation and recovery. For instance, Cutter et al. (2008) develop a framework called the disaster resilience of place (DROP) model that provides a comparative assessment of disaster resilience at the local or community level. However, the framework has still not been applied to the evaluation of the resilience of the transportation system to climate change. In terms of the measurement of resilience in transportation, various approaches have been developed. For instance, some scholars suggest that the system performance resilience of transportation can be measured in terms of reliability, travel time, speed, and vehicle counts. Tamvakis and Xenidis (2012) discuss the parameters affecting transportation resilience and propose a framework for measuring resilience from an engineering perspective. Leu et al. (2010) estimate the resilience of three types of ground transportation systems (train, tram, and street car) in Melbourne and use network analysis in that purpose. The resilience level is computed based on the interdependency and interactions of various transportation networks. Similarly, the resilience of freight transport systems has also been assessed from an aspect of network’s inherent ability. Resilience is measured as a system performance optimization problem with considerations of topological and operational attributes. Hence, one should note that resilience in all these analyses is essentially a system performance measure, in which resilient ability is measured in technical network performance from an engineering perspective. The primary goal is to enhance the preevent capability of a transportation system to withstand a disaster (or robustness). Economic resilience, on the other hand, measures the ability of a system to recover from a disruption. The focus is on the most effective resilient strategies from both the demand and the supply sides to reduce the economic losses in terms of gross domestic product (GDP) and employment after a disaster/disruption. In a transportation system, economic resilience can be implemented by several tactics

Environmental Conditions and New Geography: The Case of Climate Change. . .

1393

such as modal substitution, conservation of service supply, excess capacity, and rerouting. Decision makers can apply these various strategies on the suppliers such as mass transit providers and public transportation authorities or on the customers such as businesses and individuals. Economic resilience differs from system performance resilience in that the former evaluates the effectiveness of various resilience tactics to reduce economic losses whereas the latter primarily addresses mitigation strategies for the infrastructure system through optimized solutions that improve network topology and system design. Furthermore, economic resilience is less concerned with system output than with the service’s contribution to the economy. For example, if the predisaster level of production can be obtained by reduced transportation services, say via telecommuting or walking, then economic resilience is achieved. In addition, transportation resilience and vulnerability have also been examined extensively from the perspective of regional science, with focuses on the roles of connectivity and accessibility in relation to transportation security. For instance, economic resilience and accessibility could be analyzed from a spatial perspective in order to examine the correlation between resilience and accessibility. Scholars generally believe that transportation resilience is more difficult to measure than transportation vulnerability because the latter is more concrete in real-world networks with links, areas, and morphological features. They suggest that one possible measurement is to identify resilience/vulnerability characteristics using various topological network metrics such as clustering, degree centrality, and correlated networks. In fact, such a measurement and analysis is termed “spatial economic resilience” by Modica and Reggiani (2014), and the concept has linkages to both stability (engineering resilience) and adaptivity (ecological and economic resilience).

4

Conclusion

Changes in the global climate and environment raise many new challenges to human society. Solutions to address these challenges require not only an understanding of the existing links between environmental conditions and human systems, but they also necessitate to identify the current methodological and conceptual gaps for future research needs to work on. As indicated in this chapter, in spite of the increasing number of studies evaluating the impact of climate change on agriculture and transportation, major limitations are still present and deserve further attention. A major flaw is that most of the impact assessments use evaluation techniques developed decades ago whereby spatial units are treated independently from each other or a single hazard event/single transportation mode is investigated. Therefore, there are still many opportunities to quantify the impacts of climate change from a system and multihazard perspective. Furthermore, future research efforts should also pay more attention to the adoption of an interdisciplinary approach when assessing the impact of climate change. For instance, engineers should work more closely with economists to incorporate

1394

S. Dall’erba and Z. Chen

better figures of direct estimated damages and prioritize reconstruction efforts in their analysis. In addition, economists should be more sensitive to the actual geophysical processes such as up- and downstream (or up- and downwind) relationships that drive spatial dependency across agricultural areas. Increasing awareness of today’s challenges is a necessary step to make future adaptation and mitigation strategies more effective and efficient than past ones.

5

Cross-References

▶ Climate Change and Regional Impacts ▶ Land-Use Transport Interaction Models ▶ Rural Development Policies at Stake: Structural Changes and Target Evolutions During the Last 50 years ▶ Supply Chains and Transportation Networks

References Bae J, Dall’erba S (2018) Crop production, export of virtual water and water-saving strategies in Arizona. Ecol Econ 146:148–156 Chen Z, Dall’erba S (2019) The U.S. Interstate trade will overcome the negative impact of climate change on agricultural profit, regional economics applications laboratory. Discussion paper 19T-1, University of Illinois at Urbana-Champaign Chen Z, Wang Y (2019) Impacts of severe weather events on high-speed rail and aviation delays. Transp Res Part D: Transp Environ 69:168–183 Conley TG (1999) GMM estimation with cross sectional dependence. J Econ 92:1–45 Cutter SL, Barnes L, Berry M, Burton C, Evans E, Tate E, Webb J (2008) A place-based model for understanding community resilience to natural disasters. Glob Environ Chang 18(4):598–606 Dall’erba S, Dominguez F (2016) The impact of climate change on agriculture in the South-West United States: the Ricardian approach revisited. Spat Econ Anal 11(1):1–19 Deschênes O, Greenstone M (2007) The economic impacts of Climate Change: evidence from agricultural output and random fluctuations in weather. Am Econ Rev 97(1):354–385 Fotheringham AS, Yang W, Kang W (2017) Multiscale geographically weighted regression (MGWR). Ann Am Assoc Geogr 107:1247–1265 IPCC. Climate Change (2014) Synthesis report. In: Core Writing Team, Pachauri RK, Meyer LA (eds) Contribution of Working Groups I, II and III to the fifth assessment report of the Intergovernmental Panel on Climate Change. IPCC, Geneva Koetse MJ, Rietveld P (2009) The impact of climate change and weather on transport: an overview of empirical findings. Transp Res Part D: Transp Environ 14(3):205–221 Le Sage J, Pace K (2009) Introduction to spatial econometrics. Taylor and Francis/CRC, Boca Raton Leu L, Abbass H, Curtis N (2010) Resilience of ground transportation networks: a case study on Melbourne. Paper delivered at the 33rd Australasian Transport Research Forum conference held in Canberra, on 29 September–1 October Lu QC, Zhang J, Peng ZR, Rahman AS (2014) Inter-city travel behaviour adaptation to extreme weather events. J Transp Geogr 41:148–153 Mendelsohn R, Nordhaus WD, Shaw D (1994) The impact of global warming on agriculture: a Ricardian analysis. Am Econ Rev 84(4):753–771

Environmental Conditions and New Geography: The Case of Climate Change. . .

1395

Modica M, Reggiani A (2014) Spatial economic resilience: overview and perspectives. Netw Spat Econ 15(2):211–233 Polsky C (2004) Putting space and time in Ricardian climate change impact studies: agriculture in the U.S. Great Plains, 1969–1992. Ann Assoc Am Geogr 94(3):549–564 Rose A (2004) Defining and measuring economic resilience to disasters. Disaster Prev Manag Int J 13(4):307–314 Schimmelpfennig D, Lewandrowski J, Reilly J, Tsigas M, Parry I (1996) Agricultural adaptation to climate change: issues of long range sustainability. Agricultural Economic Report no (AER740), 68 pp, June Schlenker W, Hanemann WM, Fisher AC (2005) Will U.S. agriculture really benefit from global warming? Accounting for irrigation in the hedonic approach. Am Econ Rev 95(1):395–406 Schlenker W, Hanemann WM, Fisher AC (2006) The impact of global warming on U.S. agriculture: an econometric analysis of optimal growing conditions. Rev Econ Stat 88(1):113–125 Stamos I, Mitsakis E, Salanova JM, Aifadopoulou G (2015) Impact assessment of severe weather events on transport networks: a data-driven approach. Transp Res Part D: Transp Environ 34:168–178 Tamvakis P, Xenidis Y (2012) Resilience in transportation systems. Procedia Soc Behav Sci 48:3441–3450 Taylor MAP, Philp M (2010) Adapting to climate change – implications for transport infrastructure, transport systems and travel behavior. Road Transp Res 19:66–79 Tsapakis I, Cheng T, Bolbol A (2013) Impact of weather conditions on macroscopic urban travel times. J Transp Geogr 28:204–211

Social Capital, Innovativeness, and the New Economic Geography Charlie Karlsson and Hans Westlund

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Social Capital and Innovativeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Social Capital, Innovativeness, and the New Economic Geography . . . . . . . . . . . . . . . . . . . . . . 3.1 A Short Introduction to New Economic Geography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 New Economic Geography and Knowledge-Based Growth . . . . . . . . . . . . . . . . . . . . . . . . 3.3 New Economic Geography, Social Capital, and Innovativeness . . . . . . . . . . . . . . . . . . . 4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1398 1399 1402 1402 1404 1407 1410 1411 1411

Abstract

This chapter discusses the relationship between, on the one hand, the knowledge economy’s social capital and innovativeness and, on the other hand, the new economic geography, which mainly is valid for the manufacturing economy. The discussion ends up in a proposition for a “new economic geography 2 model,” in which the knowledge producing sector and its relations to other sectors are modeled in a framework of two or more regions. The overarching mechanisms and results are similar in the two models: cumulative causation create and enhance regional disparities, but the processes are partly different. It may seem somewhat paradoxical that the industrial specialization, which is a result of the

C. Karlsson Economics, Jönköping International Business School, Jönköping, Sweden e-mail: [email protected] H. Westlund (*) Department of Urban Planning and Environment, KTH Royal Institute of Technology, Stockholm, Sweden e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_149

1397

1398

C. Karlsson and H. Westlund

original new economic geography model, in the new model is replaced by “specialization in diversification,” but the knowledge economy’s specialization is “subtle” and diverse – and, as it seems, with subsectors related to and dependent on each other. This is all a reflection of the proposition that social capital and innovativeness play a much larger role for growth and development in the knowledge economy than they did in the manufacturing economy. Keywords

Social capital · Innovativeness · Knowledge economy · Knowledge spillovers · New economic geography 2 · Cumulative causation · Economies of scale · Increasing returns · Diverse specialization

1

Introduction

The location of economic activities has become a key research area in today’s knowledge economy. Variations in the performance and growth of regions are ultimately dependent on a set of relatively immobile resources of which human and social capital are two essential ones. Even if the bearers of human capital are semimobile, there is a need for them to cluster to capitalize on their human capital, since economic activities are embedded in social contexts. Within the actual key research area, the location of innovative activities and their importance for agglomeration and regional growth have become a key research focus. Variations in innovativeness between regions are explained by the spatial accessibility of specialized innovation inputs, foremost specialized and diverse knowledge, the accessibility of demanding customers with a high willingness to pay for novel products and the prevailing regional institutional and social capital milieu. The variation between regions in knowledge accessibility is a function of regional variations in the composition of the regional human capital stocks, social capital, the industrial structure, the existence of knowledge-generating organizations, and the quality of the infrastructure capital. The transfer of existing and new knowledge between economic agents in a region is a function of the prevailing proximities including social proximity which is a function of the prevailing regional social capital. Geographical proximity is still important for knowledge transfer, despite the information and communication technology revolution in recent decades. When exchanging critical and specifically, tacit knowledge, actors are inclined to rely on those connections that are more parochial in nature and with whom they subsequently interact on a more frequent basis, than on their, in geographical terms, extended network contacts. When the volume of knowledge activities increases, regions tend to become richer, which will increase the purchasing power and increase the probability that inventions are turned into innovations. However, when regions grow richer land prices, rents and wages tend to increase, which will induce less knowledge-dependent activities to relocate to other regions with the result that land, premises, and labor will be freed for expanding knowledge-intensive activities, which will tend to

Social Capital, Innovativeness, and the New Economic Geography

1399

grow further. With an increased employment of highly paid knowledge-workers, the demand for lower-paid service workers will also increase. Networking and the associated social communication, informal bonds, etc. will over time become more and more important in the knowledge-rich regions. This opens up for the possibility that new clusters with specialized human and social capital will emerge in such regions. Parallel to the increased awareness of the role of social relations and proximity in the knowledge economy’s innovation processes, a new paradigm has emerged in the discipline of economics, viz., the new economic geography. At a first glance, these two approaches do not seem to have not more in common than, from various positions, acknowledging the importance of agglomeration and proximity. Knowledge spillovers, innovation, and social capital are concepts that have no place in traditional models of the new economic geography, and the strict assumptions of the new economic geography have been hard to combine with the far more “soft” theories on the knowledge economy and its driving forces. From a general perspective, this is of course very unsatisfactory. This chapter is therefore devoted to the challenging task of trying to link the two perspectives together and to discuss the relationship between, on the one hand, the knowledge economy’s social capital and innovativeness, and, on the other hand, the new economic geography more in detail. The chapter is organized as follows: The following section presents the concept of social capital and its links to innovativeness. Section 3 adds the new economic geography and proposes a “new economic geography 2” model. Section 4 gives some concluding remarks.

2

Social Capital and Innovativeness

Ever since Putnam (1993) made the concept well known, social capital has been a very popular expression in research and public debate, but also a concept with a plethora of interpretations and definitions. We use here the definition of Westlund (2006, p. 8) claiming that social capital is: “social, non-formalized networks that are created, maintained and used by the networks’ nodes/actors in order to distribute norms, values, preferences and other social attributes and characteristics.” Social capital can be considered as a type of informal institutions, i.e., norms and values (e.g., trust) that lack formal status, but still influence actors’ behavior and activities. Social capital can be described by several dichotomies. It can be viewed as an asset of individuals, as well as a collective asset. An individual connected to a network or organization can benefit from knowledge and information spread by other network members; at the same time, the network/organization benefits from the collective knowledge its members possess. Another division is between bonding and bridging social capital. Bonding social capital refers to the links and values that contribute to keeping a group/network together, while bridging social capital comprises the links to other groups/networks and the values that permit or support the building of such links. The bonding/bridging division has strong resemblances with the division of social capital in internal and external links and values. The latter

1400

C. Karlsson and H. Westlund

Social capital internal to the enterprise Links/relations which create and distribute attitudes, norms, traditions etc. that are expressed in the form of: - Company spirit - Climate of cooperation - Methods of codifying knowledge, product development, conflict resolution, etc.

The enterprise’s external social capital Production-related Links/relations to suppliers, product users, partners in cooperation and development

Environment-related

Market-related

Links/relations to the local/regional environment, to political decisionmakers (incl. investment in lobbying), universities, (non-production-related links to) other enterprises

General customer relations built through marketing, customer clubs, programs etc. and expressed in e.g. trademarks and goodwill.

Source: Westlund (2006, p. 52).

Fig. 1 Social capital of the enterprise broken down into different component parts

division has been used by Westlund (2006), to conceptualize the social capital of enterprises and other organizations (see Fig. 1). In the literature, social capital is often described as an externality from which actors benefit without being able to have a direct influence on it (e.g., generalized trust). However, as illustrated in Fig. 1, from an organization’s perspective, social capital is something that deliberately can be built up – and social capital is not only an externality but also an internality. The ability to innovate is critical in every economy, since innovation is a key driver of productivity growth. An innovation can be the result of the localized generation of an idea, behavior, system, policy, program, device, process, product, or service that is new and represents a potential economic value for the economic actor. The emergence of an innovation is the result of an open, dynamic, nonlinear, evolutionary, knowledgeintensive, and time-consuming process in which both internal and external knowledge and actors are combined to create new combinations of resources. It is a basic underlying assumption that, in the knowledge economy, a single economic actor cannot innovate in isolation. The open innovation paradigm treats the innovation process as an open process, where ideas, knowledge and resources valuable for innovation can come from both inside and outside the innovating economic actor. Ideas and knowledge useful for innovation are widely diffused in society (Hayek 1945), which implies that innovating economic actors must be able to identify, access, evaluate, and absorb these ideas and knowledge through various external links. Thus, it is obvious that social capital must play a critical role for the innovativeness of economic actors and regions. This is natural, since innovation is an interactive iterative process, which is shaped by a variety of institutional routines and social and political conventions. These routines vary geographically and over time, with some forms being more conducive to fostering learning and innovations than others. The basic aggregated relationship between innovative efforts and inputs and knowledge outputs, i.e. innovations, can be analyzed by means of endogenous growth theory (Romer 1990), the knowledge production function approach (Griliches

Social Capital, Innovativeness, and the New Economic Geography

1401

1986), or the “technology-gap theory” of technological development. For both firms and regions, their indigenous social infrastructure has a critical role for the efficiency with which innovative inputs can be transformed to innovations in all these approaches. Turning now more specifically to the relationship between social capital and innovativeness, the core argument is that not only commercial but also social relations and collaborations among firms and other economic agents are drivers of innovativeness. Firms receive both incentives for innovating and valuable information to feed into the innovation process from the vertical dimensions of the value chains, i.e., from suppliers and customers. However, firms also collaborate horizontally in alliances with other firms including competitors, with universities and research institutes and with specialized consultancy and R&D firms in their innovation efforts. Furthermore, the recruitment of personnel from other firms including competitors and universities is an important source of novel ideas and new knowledge for the innovation process. Since world markets are dynamic and customer demands fluctuating, flexible innovation necessitates firms to connect to a range of diverse sources for ideas and knowledge and strategically change collaborations to learn in both breadth and depth. Besides the links at the firm level, the external personal relations of managers and employees with friends, colleagues, and business partners and the related informal exchange of gossip, information, and advice provide valuable sources of inspiration for innovation processes and increase the innovativeness of firms. There is a strong tendency towards clustering of knowledge-intensive activities in the large, dense regions with their superior international accessibility, which will tend to stimulate the innovativeness in these regions. We can imagine how the dynamics of social capital formation is stimulated by the formation of an increasing number of knowledge links of which a substantial share are links to specifically other large dense regions. Many of these links are planned in the sense that for example firms establish relations to researchers at research universities to get access to new knowledge within a specific field or to initiate a research collaboration, while other links emerge spontaneous when people from different organizations meet at business breakfasts, seminars, or after work (see Kourtit and Gordon 2019). What develops over time in large dense regions is a social capital consisting of a myriad of different social and other networks with intricate links between the different networks which changes over the time and where it is impossible to trace all social relations involved. Over time there will be a sorting process, where actors invest in productive social relations, while less productive social relations will disappear. What we can expect is that there will be a tendency for networks to become more specialized and only contain links, i.e., social relations to actors with a specific interest in the subject of the network. The emergence of such specialized networks will in the short run stimulate innovativeness, but in the long run, innovativeness will be dependent upon those networks with subjects characterized by related variety brought together to create new, differentiated knowledge bases that are needed for the emergence of new ideas and knowledge that can be the foundation for innovations. It is naturally so that also public policies in general and innovation policies, such as establishment and/or funding of incubators, science parks, and

1402

C. Karlsson and H. Westlund

clusters, play a role for the formation of social capital but also for the development and evolution of the local and regional innovative trajectories.

3

Social Capital, Innovativeness, and the New Economic Geography

To analyze the role of social capital for innovativeness in modern knowledge economies, we need a theoretical framework that accounts for the fact that the generation of novel ideas and new knowledge is localized and that functional regions vary in terms of the characteristics of their social capital and the extent and the type of their knowledge generation. This fits very well with the new economic geography theory, which has as its starting point the observation that economic activities are not evenly distributed across space and that, thus, functional urban regions and not countries are the natural units of economic analysis. The fundamental question asked is why only some places grow and prosper or in other words – why do investments made in certain places generate, jobs, growth, and prosperity, while similar investments made in what seems almost identical places fail to generate the desired results?

3.1

A Short Introduction to New Economic Geography

By using different versions of a general equilibrium model with the interrelations between three variables – increasing returns due to scale economies, transport costs, and the demand for manufacturing goods – as its cornerstone, the new economic geography theory explains why economic activities through processes that are endogenous in character concentrates in certain localities and regions and not in others, i.e., it provides a detailed description of the spatial inequalities that emerge (Krugman 1991; Fujita et al. 1999; Johansson and Karlsson 2001). The new economic geography focuses on providing understanding of the various regional development paths and specialization process that emerge in an economy as a consequence of the social and institutional contexts in different regions. (For a comprehensive overview of the field of “new economic geography” see Gaspar 2018.) Since the new economic geography framework assumes increasing returns to scale at the plant level and costly transportation, location decisions are no longer trivial. This new economic geography theory is not only fueled by models from the new endogenous growth theory and the new trade theory but increasingly by many other theoretical concepts dealing with the regional distribution of knowledge generation and innovative activities. The underlying ideas go back to Ohlin (1933), who simultaneously tried to explain industrial location and international trade, and to Myrdal (1957), who with his theory of cumulative causation tried to explain the spatial dynamics of uneven regional economic development. The existence of scale economies is a basic explanation for the existence and growth of cities and urban regions. Without scale economies, production and other economic activities would be dispersed to save

Social Capital, Innovativeness, and the New Economic Geography

1403

on transport costs. Explicitly or implicitly, these economic geography theories attach a high significance to geographical proximity, but we might here add that how well different regions can take advantage of their geographical proximity is a function of the characteristics of their social capital. The models developed within the new economic geography are in principle cumulative causation models with monopolistic competition. Two main types of two-region new economic geography models can be distinguished: • The first type of model – “the foot-lose model” – is based upon the argument that once one region has got an advantage of some sort, e.g., size, compared to the other region, manufacturing firms will be drawn to it (Krugman 1991). The critical factor is the so-called home market effect. The larger region makes it possible for manufacturing firms producing differentiated products to exploit the advantages of internal economies of scale. Underlying is a cumulative process, where increasing market size promotes the agglomeration of industry in one region. As this region grows bigger, its market size expands, which attracts more manufacturing firms. Increased market size implies that more differentiated product varieties are produced, which decreases the regional price level, and which leads to increases in real wages, which stimulates the influx of workers. When new manufacturing firms enter the larger region, mobile labor will be attracted to the larger region and its population will expand. This increases the strength of the home market effect furthermore stimulating a further influx of manufacturing firms and labor, and thus creating a virtuous circle of growth in the larger region through forward and backward linkages, which spur the centripetal forces. During the same time, the smaller region will experience a decline. The final spatial distribution of manufacturing industry is determined by the level of transport costs. • The second type of model is based on vertically integrated industries. Here it is the intermediate goods producers, which are the motor of the regional growth process. Here, manufacturing firms in the advantaged region, whose production is competitive with constant returns to scale, can gain further advantages in the form of lower costs for inputs as input producers locate close to the manufacturing firms, which are exploiting existing home market effects. This implies that there are socially increasing returns as aggregate production rises and that large regions are more productive due to the external effects of the investments of input producers offering a diverse set of input goods and services. An important conclusion here is that diversity and variety in producer inputs can yield economies of scale, even though all individual competitors and firms earn normal profits. Demand and supply externalities working through backward and forward linkages have an important role in stimulating further concentration in the larger region. Manufacturing firms will be attracted to the larger region, since this offers a larger home market effect and lower input costs. Thus, the demand and supply externalities induce a cumulative causation process, where the location of new manufacturing firms and new input producing firms in an integrated process reinforce existing externalities in the larger region (Johansson and Karlsson 2001).

1404

C. Karlsson and H. Westlund

The basic new economic geography models provide a technique for analyzing and explaining the concentration and dispersion of economic activities in a two-region case as being induced by some initial combinations of some basic parameter values and contribute a fundamental role to the home market effect. Product differentiation plays a fundamental role in reducing price competition, which is a strong dispersion force, and thus allows firms to locate where they have access to a regional market with a higher purchasing power with associated lower transport costs to reach customers. Regional markets with a higher purchasing power have more than a proportional share of firms due to increasing returns to scale at the firm level. In the case of more than two regions, rather than local demand, what matters is overall market access (Krugman 1993). This creates a pattern of demanddriven specialization where large central urban regions are net exporters of goods produced under increasing returns and imperfect competition (Fujita et al. 1999). The new economic geography models suggest that once a clustering process has started, it will tend to reinforce itself by attracting more firms and/or workers. This migration of firms and/or workers changes the relative attractiveness of both the origin and the destination region. The reason is that agglomeration due to external economies results in demand and supply conditions that are better in the cluster than in the smaller region and so promote the growth of incumbent firms and attract the entry of new firms and workers. This growth and entry increases cluster strength and so promotes further growth and entry, which begins to accelerate once a cluster has reached a “critical mass.” Below the “critical mass,” there exists insufficient mass and interactions to achieve the advantages of a cluster, and beyond that level, mass and interaction are sufficient for cluster advantages to exist. Implicitly, the new economic geography models assume that the larger region contains a large enough supply of potential entrepreneurs who will establish new product lines, i.e., new firms, as soon as the market potential has become large enough or that firms move from the smaller region to the larger region when the home market increases in the larger region. The new economic geography theory has benefitted from contributions from urban economics theory such as the microfoundations or urban agglomeration economies and explanations of the impact of neighborhood effects and spatial externalities on the lay-out of cities. However, it is important to stress that generally the basic new economic geography models are based on very stringent assumptions including homogeneity across consumers, workers, firms, and location space and short-sighted economic actors in terms of migration, which implies that social capital is irrelevant in general and for innovativeness, in particular.

3.2

New Economic Geography and Knowledge-Based Growth

The stylized new economic geography models presented above cannot be used for modeling knowledge-based regional growth. The basic reason is that Krugman focused on tangible factors and did not model technological externalities. The new economic geography is able to explain regional agglomeration and specialization

Social Capital, Innovativeness, and the New Economic Geography

1405

during industrialization (100–150 years ago in the developed economies and the last four decades in China). However, as Krugman himself pointed out in a talk at the Association of American Geographers, regional specialization in the USA was at its peak around 1920 and has been steadily decreasing after World War II. “The new economic geography style, its focus on tangible forces, seems less and less applicable to the actual location patterns of advanced economies” Krugman (2011, p. 6). The reason why the new economic geography is able to explain regional divergence during industrialization but not in the knowledge economy is probably the fact that externalities in the form of social capital and knowledge spillovers tend to be of minor importance in the manufacturing economy. For a worker at the assembly line – the archetype of the manufacturing economy’s production method – neither social capital nor knowledge spillovers have any impact on his productivity. This can be compared with the teamwork that characterizes the knowledge economy, where knowledge exchange is a part of work and social relations determine the quality of the knowledge exchange (Westlund 2006). The literature on innovation systems convincingly shows that knowledge spillovers are at the core of knowledge-led regional development, but that knowledge generation is highly geographically concentrated, and thus that region differs substantially in terms of: • • • •

Their volume of knowledge production. Their absorption capacity. Their accessibility to external knowledge. Their social capital and (formal) institutions.

This implies that the location of knowledge generation and the structure of the spatial networks for knowledge flows and knowledge spillovers are fundamental factors in modeling knowledge-based regional economic development. Furthermore, it must be observed that each region is characterized by its own unique knowledge mix. Actually, location is crucial in understanding knowledge spillovers, since knowledge generation has been found to be geographically concentrated and the capacity to absorb spillovers of new knowledge is mediated by geographical proximity and the characteristics of the prevailing social capital (Jaffe et al. 1993). Already Marshall (1920) identified the exchange of ideas as a type of externality leading to the localization, or clustering, of economic activity. The exchange of ideas is often called technological spillovers and has been used by Henderson (1974) and others to develop explanations for the clustering of economic activity and for differences in income and productivity across geographical space. Large regions offer special advantages in terms of knowledge spillovers since they combine the localization of clusters in specific industries with industrial diversity, i.e., a range of different industrial clusters. Thus, we essentially propose here a “New Economic Geography 2” model that is adapted to the conditions of the knowledge economy. In this model, the possibilities of knowledge spillovers in large regions that have got an initial advantage in knowledge production are the attractor of knowledge-creating and knowledge-using firms, since it offers

1406

C. Karlsson and H. Westlund

opportunities of taking advantage of increasing returns in knowledge production. Preferably, the exact working of these mechanisms should be analyzed through an explicit formal modeling of the knowledge-producing sector and its relations to other sectors in a framework containing two or more regions, which infers how the creation and transfer of knowledge affects the location of economic activities (Fujita and Thisse 2013), but this is beyond the scope of this chapter. Although it is indeed true that firms compete and not regions, per se, there are still important regional production and knowledge advantages that are being pursued by many of the actors within each individual region. This leads to an important premise: novel ideas, new knowledge, and innovations emerge uniquely out of regions not simply because one or the other was endowed with a certain initial stock of factors of production but because many of the assets necessary to compete are created during the development of industries and clusters. These assets include the technological and entrepreneurial knowledge to exploit certain factors of production, social capital, and regional innovation systems that are designed to create and sustain further advantages. Furthermore, many of these technological, entrepreneurial, social capital, and institutional assets are not easily transferable between regions. Furthermore, it seems natural to suggest that it is the size of the regional market potential that determines the probability that new knowledge, i.e., inventions, will be turned into innovations. The underlying reason is that with a larger market potential increases the demand for knowledge intensive products. The larger probability of turning inventions into innovations in large regions gives knowledge creating and knowledge using firms producing products under monopolistic competition an extra advantage of locating in large regions. These firms need skilled labor to develop and to produce new product varieties, and when the number of these firms increases, the demand for skilled labor also increases, increasing wages and stimulating the migration of skilled labor from other less knowledge-intensive regions to the extent that migration costs allow. Thus, differences in the skill level of workers will have clear implications for their spatial distribution. When more knowledge-creating and knowledge-using firms locate in the large regions, this makes these regions more attractive for knowledge workers, and for other knowledge-creating and knowledge-using firms. More people and more firms in the region increase the market potential, which increases the attractiveness of these regions furthermore, and so on. The possibility of talented people being attracted, trained, and retained in specific locations, however, depends of the quality of the regional labor market as well as the characteristics of the place including its social capital, its institutions, and prevailing policies. Thus, the evolution of knowledge-intensive regional labor markets involves a complex interaction of technical, economic, social, institutional, and political forces, which collectively determine how economies function. As large regions grow the increased demand for land, premises and knowledge labor will tend to drive up their prices. Explicit recognition of the land and labor markets and the necessity of commuting suggests that, at some point, the increased costs of larger urban regions – higher rents and higher wages arising from competition for space and labor and higher commuting costs to more distant residences –

Social Capital, Innovativeness, and the New Economic Geography

1407

Transportation costs, Economies of scale, Factor mobility, Market size NEG 1 Manufacturing economy

NEG 2 Knowledge economy

Rural regions Stagnation & decline

Overspecialization Lack of diversification Lack of innovation and renewal Externalities are limited and produce no, or weak effects Stagnation & decline

Urban regions Agglomeration & Specialization Growth

Diversification through a multitude of “subtle” specialized clusters. Innovation through knowledge spillovers, social capital and other externalities Growth

Fig. 2 The main differences between the original new economic geography model and the “new economic geography 2 model”

will offset the production (and consumption) advantages of diversity. It seems natural to assume that many activities and industries go through a life cycle, in which externalities are only important in the early development stages. This implies that they become less dependent on knowledge-intensive labor, an innovationsupporting social capital, specialized knowledge inputs, and knowledge spillovers as they mature (Krugman 1991), and thus there will be strong economic incentives for many of these activities and industries to leave the large regions and move to smaller regions when the actual prices go up. This increases the opportunities for new knowledge-intensive activities to develop and grow in the large regions. Overall, we can imagine a pattern where knowledge-intensive, experimental, and innovative activities are found in large, diverse, cross-fertilizing urban regions, but standardized production is decentralized to smaller more specialized (and lower cost) regions (Duranton and Puga 2001). Figure 2 summarizes the main differences between the original New Economic Geography model and the new model we propose for the knowledge economy.

3.3

New Economic Geography, Social Capital, and Innovativeness

The above discussion implies that central elements of the process of innovation are regional rather than national. Functional (urban) regions are increasingly regarded as important nodes of innovation, production, consumption, trade, and decision-making and play a critical role in the global modes of production and transportation. Thus, innovation is a localized process and innovation systems tend to a high extent to be bounded within functional regions. A fundamental reason for this is the nature of the interaction in innovation systems. The exchange of complex knowledge demands face-to-face interaction, and intensive and frequent face-to-face interaction takes place mainly within functional regions, that for practical reasons may be approximated by commuting regions (Andersson and Karlsson 2004).

1408

C. Karlsson and H. Westlund

However, the analysis must integrate the linkages that transfer ideas and knowledge within and between regions (Berliant and Fujita 2009). Geographical space is an essential feature of knowledge creation and diffusion (Fujita 2007). People living and working in a region interacts more frequently intraregionally than interregionally and develops a joint social capital consisting of a set of interrelated social networks. People interact most frequently with those who live and/or work in close geographical proximity and with whom they share backgrounds, interests, education employer, affiliations, and so on, i.e., people in close social proximity. This kind of interaction or networking requires investments in social capital, i.e., social communication and informal bonds as well as training and education. To build up and operate effectively in social networks requires time and effort. Geographical proximity strongly influences the durability of interaction links by reducing the costs of maintaining them. As every region consists of many localities and individuals differ in their place of residence and place of work, it implies that the social network structure of individuals varies and thus their access to social capital. Each region tends to develop its own social capital, which facilitates the dispersion of knowledge between regional actors, and which forms the base for important learning processes on which innovative performance is built. Innovative economic agents are change agents who shape local environments, institutions, and social capital. They develop local resources and social relations that further their own interests and create a local environment that is positive for innovation. However, regions and localities harbor multiple social contexts and not all of them need to be equally supportive to learning. The knowledge generation and the economy of the region as a whole evolves according to the synergies, which results from interaction with other regions each with its own special social capital. Thus, knowledge generation and location are determined interdependently (Duranton and Puga 2001). We can assume that well connected social networks in a region will promote the productivity of knowledge creation and, thus, innovativeness. Improvements in the communication between different social networks will foster the creation of localities and regions in which social capital promotes productivity and innovativeness (cf. Ottaviano and Parolo 2009). Better communication allows actors in different social networks to interact and to benefit from knowledge externalities. For knowledge externalities to emerge, the knowledge transferred must be diverse but with related varieties. Relatedness measures the cognitive proximity between actors and thus the potential for knowledge spillovers (Frenken et al. 2007). Related variety induces knowledge transfers between complementary sectors at the regional level and is thus crucial for knowledge generation at the regional level. This adds a new dimension to the role of knowledge diversity and location in the generation and diffusion of knowledge. Specifically, knowledge diversity tends to promote knowledge generation and economic development in knowledge-intensive regions. The options to interact face-to-face is the most fundamental aspect of geographical proximity, since it confers four distinct benefits (Storper and Venables 2004): (i) it is an efficient communication technology, (ii) it allows actors to align commitments and thereby reduces incentive problems in uncertain economic milieus, (iii) it facilitates screening and socialization, and (iv) it provides psychological motivation.

Social Capital, Innovativeness, and the New Economic Geography

1409

However, how well this face-to-face interaction functions is a function of the prevailing social capital as well as of the internal spatial structure of regions. At the same time, one should realize that the ongoing face-to-face interaction over time will transform the social capital in a region. It is in this connection also important to remember that face-to-face interaction is costly, since it is time consuming. Meetings must be set up, people must travel to the meeting place, and the meeting itself takes time. Furthermore, since actors cannot engage in face-to-face interaction with all other actors, they need to screen out the actors with whom they want to interact, which per se is time-consuming. How effectively different actors can screen depends on how well connected they are in formal and informal social networks and the characteristics and qualities of the actor’s social relations. The exact spatial form of the aggregated social network configurations differs between different sectors and regions, not the least in terms of the relative importance of local respectively interregional networks, where learning is secured through various combinations of local “buzz” and “global pipelines” (Storper and Venables 2004), where “buzz” refers to the information and communication ecology created through face-to-face interactions between groups of actors in a locality or a region through their social networks. Actors contribute to and benefit from the exchange of information, gossip, news, and knowledge through their co-location with other actors and their embeddedness in various communities. Thus, the combined effect of the four benefits is called “buzz,” which through their super-additivity generate increasing returns for the actors, activities, and sectors involved; though to fully reap these benefits, actors must normally be collocated in the same locality, cluster, and/or region. This underscores the significance of social capital, which includes relational and cognitive proximity, for geographical proximity to generate its full potential in terms of knowledge generation, networking, knowledge transfer, and learning and innovation specifically for knowledgeintensive sectors but also for other sectors but this should not lead to an underestimation of the importance of external flows of ideas, information, and knowledge (Rodriguez-Pose and Crescenzi 2008). As has been stressed several times above, localities and regions are not spatially isolated entities but nodes in many different networks containing other localities and regions. These interregional and international networks offer different channels for knowledge transfers, which determine the spatial range and speed of transfer of ideas and knowledge between regions. In particular, these linkages – “the pipelines” – feed regions with novel ideas and knowledge generated in other knowledge-generating and innovative nodes. By introducing novel ideas, knowledge, and products in the importing regions, these outside relations guard to some extent against lock-in and too rigid innovation hampering local networks. The benefits for learning and innovation from non-local “weak” ties have been stressed by Grabher and Ibert (2014). Regional actors consciously plan for and establish linkages to actors in other regions that may be customers, suppliers, or knowledge sources. All these different linkages may carry knowledge to the individual regional actor but due to distance constraints, it may take time before the new knowledge arrive and still more time before it is diffused through “buzz” in the local social networks. How well the external and the

1410

C. Karlsson and H. Westlund

local linkages interplay is a function of local social capital and absorptive capacity as well as of the degree of complementarity between local and external knowledge. By combining the traditional pecuniary externalities in the new economic geography models with the transfer of ideas and knowledge through intra- and interregional interactions, it might be possible to draw conclusions about a possible circular causality between migration of knowledge handlers and knowledge flows. Thus, new economic geography models may shed light on the importance of knowledge exchanged between different regions for their innovativeness, where knowledge is exchanged through trade networks, interregional knowledge networks, internal networks of multinational firms, and the migration of innovative firms and knowledge handlers (Broekel et al. 2014) but also through social networks and informal contacts. We can here easily imagine how those regions who for some reason have got an advantage in terms of local and interregional knowledge accessibility will tend to specialize in activities and sectors that are most likely to benefit from face-to-face interaction and thus emerge as innovation hubs, while other regions without such an advantage will tend to specialize in other types of activities and sectors as production hubs. These aggregate specialization processes will be the result of uncountable processes at the micro level involving not in the least the emergence, restructuring, and disappearance of numerous social networks with a varying degree of interconnectedness, where the social networks in the innovation hubs will have a focus on knowledge relevant specifically for innovation, while those in the production hubs will have a focus on knowledge relevant for productivity.

4

Conclusion

The location of economic activities has become a key research area in today’s knowledge economy. Variations in the performance and growth of regions are ultimately dependent on a set of relatively immobile resources of which human and social capital are two essential ones. Even if the bearers of human capital are semimobile, there is a need for them to cluster to capitalize on their human capital, since economic activities are embedded in social contexts. Within the actual key research area, the location of innovative activities and their importance for agglomeration and regional growth have become a key research focus. Variations in innovativeness between regions are explained by the spatial accessibility of specialized innovation inputs, foremost specialized and diverse knowledge, the accessibility of demanding customers with a high willingness to pay for novel products, and the prevailing regional, institutional, and social capital milieu. This chapter has against this background discussed the knowledge economy’s social capital and innovativeness from the perspective of the new economic geography approach. The discussion ended up in a proposition for a “new economic geography 2 model,” in which the knowledge producing sector and its relations to other sectors are modeled in a framework of two or more regions. The overarching mechanisms and results are similar in the two models: cumulative causation create

Social Capital, Innovativeness, and the New Economic Geography

1411

and enhance regional disparities, but the processes are partly different in the knowledge economy. It may seem somewhat paradoxical that the industrial specialization, which is a result of the original new economic geography model, in the new model is replaced by “specialization in diversification,” but the knowledge economy’s specialization is “subtle” and diverse – and, as it seems, with subsectors related to and dependent on each other. This is all a reflection of the proposition that social capital and innovativeness play a much larger role for growth and development in the knowledge economy than they did in the manufacturing economy.

5

Cross-References

▶ Changes in Economic Geography Theory and the Dynamics of Technological Change ▶ Cities, Knowledge, and Innovation ▶ Clusters, Local Districts, and Innovative Milieux ▶ Geography and Entrepreneurship ▶ Incorporating Space in the Theory of Endogenous Growth: Contributions from the New Economic Geography ▶ Knowledge Flows, Knowledge Externalities, and Regional Economic Development ▶ Knowledge Spillovers Informed by Network Theory and Social Network Analysis ▶ Local Social Capital and Regional Development ▶ New Economic Geography After 30 Years ▶ New Economic Geography and the City ▶ New Geography Location Analytics ▶ Path Dependence and the Spatial Economy: Putting History in Its Place ▶ Regional Growth and Convergence Empirics ▶ Spatial Policy for Growth and Equity: A Forward Look ▶ The Geography of Innovation ▶ The Geography of R&D Collaboration Networks ▶ The New Economic Geography in the Context of Migration

References Andersson M, Karlsson C (2004) Regional innovation systems in small and medium-sized regions. A critical review and assessment. In: Johansson B, Karlsson C, Stough RR (eds) The emerging digital economy: entrepreneurship, clusters and policy. Springer, Berlin, pp 55–81 Berliant M, Fujita M (2009) Dynamics of knowledge creation and transfer: the two person case. International Journal of Economic Theory 5:155–179 Broekel T et al (2014) Modeling knowledge networks in economic geography: a discussion of four models. Ann Reg Sci 53:423–452 Duranton G, Puga D (2001) Nursery cities: urban diversity, process innovation, and the life cycle of products. Am Econ Rev 91:1454–1477

1412

C. Karlsson and H. Westlund

Frenken K, van Oort F, Verburg T (2007) Related variety, unrelated variety and regional economic growth. Reg Stud 41:685–697 Fujita M (2007) Towards the new economic geography in the brain power society. Reg Sci Urban Econ 37:482–490 Fujita M, Thisse J-F (2013) Economics of agglomeration: cities, industrial location and globalization, 2nd edn. Cambridge University Press, Cambridge Fujita M, Krugman P, Venables A (1999) The spatial economy. Cities, regions and international trade. The MIT Press, Cambridge, MA Gaspar JM (2018) A prospective review of new economic geography. Ann Reg Sci 61:237–272 Grabher G, Ibert O (2014) Distance as asset? Knowledge collaboration in hybrid virtual communities. J Econ Geogr 14:97–123 Griliches Z (1986) Productivity, R&D and basic research at the firm level in the 1970s. Am Econ Rev 76:141–154 Hayek F (1945) The use of knowledge in society. Am Econ Rev 35:19–30 Henderson JV (1974) The size and types of cities. Am Econ Rev 89:640–656 Jaffe A, Trajtenberg M, Henderson R (1993) Geographical localisation of knowledge spillovers as evidenced by patent citations. Q J Econ 108:577–598 Johansson B, Karlsson C (2001) Geographic transaction costs and specialization opportunities of small and medium-sized regions: scale economies and transaction costs. In: Johansson B, Karlsson C, Stough RR (eds) Theories of endogenous regional growth – lessons for regional policies. Springer, Berlin, pp 150–180 Kourtit K, Gordon P (2019) Spatial clusters and regional development. In: Handbook of regional growth and development theories: revised and extended, 2nd edn. Springer, Berlin Krugman P (1991) Geography and trade. The MIT Press, Cambridge, MA Krugman P (1993) The hub effect: or, threeness in international trade. In: Ethier WJ, Helpman E, Neary JP (eds) Theory, policy and dynamics in international trade. Cambridge University Press, Cambridge, pp 29–37 Krugman P (2011) The new economic geography, now middle-aged. Reg Stud 45:1–7 Marshall A (1920) Principles of economics, 8th edn. Macmillan, London Myrdal G (1957) Economic theory and underdeveloped regions. Duckworth, London Ohlin B (1933) Interregional and international trade. Harvard University Press, Cambridge, MA Ottaviano G, Parolo G (2009) Cultural identity and knowledge creation in cosmopolitan cities. J Reg Sci 49:647–662 Putnam RD (1993) What makes democracy work? Natl Civ Rev 82:101–109 Rodriguez-Pose A, Crescenzi R (2008) Research and development, spillovers, innovation systems and the genesis of regional growth in Europe. Reg Stud 42:51–67 Romer PM (1990) Endogenous technological change. J Polit Econ 98:97–103 Storper M, Venables AJ (2004) Buzz: face-to-face contact and the urban economy. J Econ Geogr 4:351–370 Westlund H (2006) Social capital in the knowledge economy. Springer, Berlin/Heidelberg/New York

Section VIII Environment and Natural Resources

Spatial Environmental and Natural Resource Economics Amy W. Ando and Kathy Baylis

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Spatial Heterogeneity and Optimal Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Spatial Heterogeneity in Conservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Effect of Space on Market-Based Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Spatial Elements of Nonmarket Valuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Hedonic Valuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Travel-Cost Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Stated Preference Valuation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Spatial Empirical Identification Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Environment and Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Evaluations of Protected Areas and Payment for Environmental Services Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Models of Behavior in Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Spatial Sorting Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Behavior in Resource Use and Conservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1416 1417 1417 1419 1420 1420 1422 1423 1424 1425 1426 1427 1428 1429 1430 1432 1432

Abstract

Environmental and natural resource economics has long wrestled with spatial elements of human behavior, biophysical systems, and policy design. The treatment of space by academic environmental economists has evolved in important ways over time, moving from simple distance measures to more complex models of spatial processes. This chapter presents knowledge developed in several areas of research in spatial environmental and natural resource economics. First, it A. W. Ando (*) · K. Baylis Department of Agricultural and Consumer Economics, University of Illinois at Urbana-Champaign, Urbana, IL, USA e-mail: [email protected]; [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_53

1415

1416

A. W. Ando and K. Baylis

discusses the role played by spatial heterogeneity in designing optimal land conservation policies and efficient incentive policies to control pollution. Second, it describes the role space plays in nonmarket valuation techniques, especially hedonic and travel cost approaches, which inherently use space as a means to identify values of nonmarket goods. Third, it explains a set of quasi- or naturalexperimental empirical methods which use spatial shocks to estimate the effects of pollution or environmental policy on a wide range of outcomes such as human health, employment, firm location decisions, and deforestation. Finally, it describes spatial models of human behavior including locational sorting and the interaction of multiple agents in a land use/conservation setting. The chapter ends with a discussion of some promising future areas for further evolution of the incorporation of space in environmental economics. Keywords

Housing price · Marginal damage · Hedonic model · Environmental amenity · Spatial spillover

1

Introduction

Researchers have long recognized that the environment is connected to space. Whether because of the distribution of resource quality across space, differential pollution loads, or site-specific policies, space and location matter in environmental and resource economics. Actions have spillover effects across space; emissions from one place can affect environmental quality in neighboring locations, and fragmentation can degrade the habitat benefits of a given area of conserved land. Spatial work in environmental and natural resource economics has evolved over time. To take space into account, theoretical work by environmental economists began by including simple spatial resource heterogeneity and contiguity in research on optimal policy design. Initially, heterogeneity was defined as a simple uniform distribution over space, and a single contiguous area was assumed to generate higher ecosystem or habitat benefits than fractured parcels regardless of proximity or the intensity of intervening land uses. Much of the early empirical work that used space came in the form of hedonic regressions to value location-specific environmental amenities. As a first step, as with the theoretical work, space was usually defined in terms of distance from environmental features or location in certain polygons of the landscape. Spatial empirical work advanced with the introduction of spatial econometrics (chapter ▶ “Cross-Section Spatial Regression Models”). Many empirical papers in environmental economics began to take space into account, initially treating it as a nuisance parameter that generated spatially correlated error terms (Anselin 2002) instead of as an informational component of the data generating process. Later innovation in environmental economics adopted more nuanced and detailed treatments of spatial processes. For example, research began to differentiate between

Spatial Environmental and Natural Resource Economics

1417

neighbors on the basis of the direction of pollution flows. Detailed modeling of the spatial nature of ecosystem services, such as habitat provision, is also becoming more common in the literature. Thus, instead of simply controlling for spatial interactions based on a predetermined definition of “neighbors,” authors are now justifying why and how space might affect their model or empirical results, drawing from relevant literatures on natural processes or human interactions. The most recent step in the evolution of spatial environmental and natural resource economics is the identification and estimation of strategic behavior over space. The idea that location affects land use has been around since von Thünen (chapter ▶ “Classical Contributions: Von Thünen and Weber”). Recent work allows for human migration in response to transportation costs or differential preferences. For example, we have seen large growth in research on locational sorting. Another literature has begun to explore spatial strategic behavior in the subfield studying land use. Some work addresses how actors respond to land use changes or policies and incorporates those reactions into models that target land for conservation. Recent papers have also begun to take the existence of multiple policy makers, private agents, and possible strategic responses into account to better reflect the multitude of principals and agents that collectively affect land use decisions. In this chapter, we present knowledge developed in several areas of research in spatial environmental and natural resource economics, emphasizing areas that have been and continue to be foci of active research in recent years. We begin with models of simple spatial heterogeneity, starting with a discussion of optimal land conservation policies and moving to analyze how special heterogeneity affects efficient pollution trading. We next discuss the use of space in nonmarket valuation techniques, especially the hedonic and travel cost approaches which inherently use space as a means to identify values of nonmarket goods. The third section of the chapter explains a set of quasi- or natural-experimental empirical methods which use spatial shocks to estimate the effects of pollution or environmental policy on a wide range of outcomes such as human health, employment, firm location decisions, and deforestation. Finally, we describe spatial models of human behavior including locational sorting and the interaction of multiple agents in a land use/conservation setting. We conclude with a discussion of some promising areas for evolution of the modeling of space in environmental economics. While this chapter is by no means comprehensive, it is intended to give the reader a sense of how space is treated in modern environmental and natural resource economics.

2

Spatial Heterogeneity and Optimal Policy

2.1

Spatial Heterogeneity in Conservation

Early work in environmental and resource economics determined how to choose conservation and reserve sites optimally when costs and environmental benefits are heterogeneous across space. Simple computational optimization routines can be used to choose sites or spatially target conservation funds to generate the maximum

1418

A. W. Ando and K. Baylis

environmental benefits possible, often taking account of complementarities between multiple parcels in the landscape. With fixed parameters of the problem – budget size, benefits, and costs of conserving the parcels – optimal site selection routines will select sets of parcels that have high benefit-cost ratios (Newbold and Siikamäki 2015). Analysis of protected-area network design can, however, also account for the role of space in complex ecological processes when such processes have important effects on optimal design. Production of ecosystem services from reserves often depends on the configuration as well as the total area of lands that are protected. Thus, the integer programming models used for optimal reserve-site selection have been enriched to favor patterns of land that display certain levels of agglomeration. Sophisticated versions of this work use programming models to choose costeffective terrestrial reserves in light of detailed spatial idiosyncrasies of the conservation target at hand. Such a model can include details about the population dynamics (chapter ▶ “Dynamic and Stochastic Analysis of Environmental and Natural Resources”) of the species which is the focus of conservation activity, and models of how the species population depends on proximity to certain features of the landscape and the quality of the unprotected land that lies between reserves. Such models should also incorporate information about spatial heterogeneity in economic use and value. Most such analyses identify the network of lands that maximizes economic surplus in the area while satisfying ecological requirements related to species survival (Newbold and Siikamäki 2015). Some conservation planners are concerned about uncertainty in the future quality of protected habitat due to factors such as climate change; recent work develops optimization tools from finance to pick portfolios of protected areas that exploit negative correlations across a landscape to minimize the uncertainty in future ecological returns on the landscape (Ando et al. 2018). Research in marine environments can also use spatial patch population dynamics (chapter ▶ “Dynamic and Stochastic Analysis of Environmental and Natural Resources”), specifically knowledge of species source/sink features of different areas in a marine landscape, to help policy makers design fishing regulations (including marine reserves) that serve to protect overharvested species and improve social surplus in commercial fisheries (Kroetz and Sanchirico 2015). In both terrestrial and marine analyses, the best policy is not to place spatially homogenous restrictions on human behavior (protect all wetlands, reduce total fishing effort). Instead, heavy protection of core habitat (or population source sites) can be costeffective approaches to increasing species populations (and possibly sustainable economic harvest rates), Spatial environmental and natural resource economics has also developed tools for optimal non-reserve policy design that account for important spatial phenomena. For example, economic theory helps us understand how to make spatially explicit conservation payments. Given that the marginal benefits of conservation on one parcel often depend on the spatial configuration of conservation (or lack thereof) on neighboring parcels, voluntary conservation programs can yield patterns of conservation that are suboptimally fragmented. Effective policies can offer payments for

Spatial Environmental and Natural Resource Economics

1419

conservation activities which depend on the status of nearby lands. A conservation auction can be designed to implement the efficient amount of conservation and configuration of protected areas in the landscape (Polasky et al. 2014). Policy makers can also offer agglomeration bonuses – extra payments to landowners for conservation if neighboring parcels are also conserved. Such payments provide incentives that can yield less fragmented patterns of conservation in a landscape (Parkhurst et al. 2016). In addition, policies are sometimes needed to protect natural resources from threats. Economists have studied how to design such policies efficiently when spatial features of the threats are important. For example, developing countries establish parks and protected areas within which extraction of natural resources is illegal, but it can be difficult to design a cost-effective policy to prevent illegal extraction on the part of nearby villagers. Because extraction activities are carried out by people on foot, there is a strong spatial component to the costs and benefits of extraction in different places within a park. Optimal enforcement may be concentrated in a ring excluding the center and the perimeter of the park; for most cases, the commonly used spatially homogeneous enforcement strategy is highly inefficient (Albers and Robinson 2013). Policies to control threats to natural resources from invasive species should be spatially explicit as well, using information about spatial heterogeneity in the expected costs and benefits of invasive species control to focus on invasive-species detection and control activities cost-effectively (Kaiser and Burnett 2010). Finally, spatial environmental economics makes clear that we need to be careful about spatial features of some policies designed to reduce pollution. For example, we would expect development of renewable energy sources such as solar and wind farms to reduce air pollution from electricity generation and thus might put policies in place to encourage such investments. However, the marginal emissions of changes in electricity demand vary across space and time in the U.S., so policies such as realtime electricity pricing and incentives for electric cars should also vary by location (Zivin et al. 2014).

2.2

Effect of Space on Market-Based Solutions

A standard result in environmental economics is that when pollution generates negative externalities – costs not borne by the polluter – then inefficiently large amounts of pollution will be produced by an unregulated market. In the simplest of cases, the problem of negative externalities from pollution can be solved by imposing a tax on pollution equal to the marginal external cost of the pollution evaluated at its efficient level. However, pollution and resource use often have spatially heterogeneous negative externalities. For example, air pollution is more harmful if it blows directly into populations of people, and water pollution is more harmful if emitted directly upstream from a sensitive receptor like a lake. Under these circumstances, the optimal policy response is a higher tax in locations with more impact on the receptor of concern. Policy makers should be mindful, however, that a pollution tax

1420

A. W. Ando and K. Baylis

might have unintended effects on the location of emissions, shifting the locations of polluters and increasing exposure of some people in the landscape to pollution (Chen et al. 2018). Market-based solutions to externalities such as creating tradable permits for pollution or resource use are an alternative to taxation. Like the design of optimal taxation, market-based approaches to environmental regulation are complicated by heterogeneous spatial effects of pollution. For example, while a simple pollution permit trading regime would allow one polluter to buy a permit for one unit of emissions from another firm who reduces emissions by one unit, if these firms are in separate locations and the effect of emissions is not homogenous across space, this simple trading regime will not result in the optimal distribution of pollution among sources. For example, it is clearly not optimal to trade one unit of emissions in a low-impact area for one unit of emissions in a region where pollution causes more harm. Thus, efficient trading can be complicated for pollutants that have spatially explicit impacts. The same logic applies for tradable resource use permit programs when the effects of resource use vary from one location to another, as in the case of groundwater use. One policy solution is to divide an area into subregions and only allow trading between sources that are in the same region, but this approach has the potential cost of creating thin markets. Another approach is to insist that pairs of sources trade permits at ratios that accurately reflect heterogeneity of marginal damages caused by different sources, but this solution creates administrative complexity. Spatial heterogeneity presents policy-makers with trade-offs: charging firms their true marginal damage yields efficiency gains, while increasing the costs of complexity (including the need for increased monitoring), and raising concerns about distributional features of spatially heterogeneous policies (Kuwayama and Brozovic 2013).

3

Spatial Elements of Nonmarket Valuation

Even before spatial analysis gained prominence in economics, some nonmarket valuation techniques (such as hedonic analysis and the travel-cost method) were intrinsically spatial. Environmental economists have enhanced the use of space in those methods over time, and spatial concerns have been incorporated into other nonmarket valuation tools as well (chapter ▶ “Methods of Environmental Valuation”).

3.1

Hedonic Valuation

Hedonic housing price analysis is a mainstay of the non-market valuation literature, grounded in the economic intuition that the price of a house will be a function of all its features including the environmental quality and access to natural amenities that are associated with its specific location in space (Champ et al. 2017) (chapter ▶ “The Hedonic Method for Valuing Environmental Policies and Quality”). Sellers choose

Spatial Environmental and Natural Resource Economics

1421

features to supply to maximize profit; buyers choose which house to buy (for a given price) to maximize utility. The market equilibrium yields a hedonic price function (price as a function of attributes) that can be estimated econometrically using spatially explicit data on houses, their sales prices, their conventional attributes (e.g., number of rooms, square footage), and their environmental attributes. One can interpret the marginal effect on price of an environmental feature as the marginal willingness to pay of people in this market for that feature. These marginal willingness-to-pay measures inform us about the welfare effects of highly localized changes in the environmental quality. However, it is notoriously difficult to use hedonic analysis to estimate the welfare effects of a widespread change in environmental conditions (e.g., cleaner air in all of Southern California) because the market equilibrium would change and create an entirely new hedonic price function which can be difficult to predict from current conditions. Observations in hedonic analyses can display spatial autocorrelation because of two processes. A spatial lag process arises when the outcome observed in one location is a function of the outcome of neighboring locations. For example, the price of one house may directly affect the price of the neighboring houses, perhaps by updating seller information about current market values. A spatial lag can also arise through the common use of a resource, such as neighbors competing with each other in the use of irrigation water (Anselin 2002). In contrast, a spatial error process refers to spatial correlation in the residuals. In the hedonic analysis literature, several studies have used econometric approaches that take into account possible spatial autocorrelation from both sources. Failure to account for autocorrelation can yield inconsistent estimates of the coefficients on environmental quality, while failure to capture spatial lag lead to bias, meaning that the estimates of the marginal effects of changing environmental quality in one location are missing spillovers into neighboring properties (Anselin 2002). Estimating how much pollution affects a specific house is nontrivial, since most pollution is usually only measured at a few locations in space. Thus, pollution measures are often spatially interpolated from these point data using kriging to generate an estimate of pollution at any specific latitude and longitude. Another approach to dealing with limited pollution data is to analyze housing prices within a larger spatial unit that either conform more closely to the point data or use geographic averages of the point measures. One concern is that, like all interpolated variables, these environmental variables are measured with error, and this measurement error may well be correlated with other unobservables that are also correlated with housing prices. For example, houses on a certain ridge could be subject to cooling ocean breezes that also result in a highly localized drop in pollution. The potential heteroscedasticity induced by using estimates for pollution can be addressed by correcting for both spatial and heteroscedastic error terms. However, the more fundamental concern about omitted variable bias remains. Such bias may be present even without interpolated environmental variables, for there may always be important location-specific unobserved variables that are correlated across space with both the housing price and the environmental characteristic. The problem of omitted variables can be addressed

1422

A. W. Ando and K. Baylis

by using repeated sales of the same house over time, or by including other regional fixed effects. Recent studies have also deployed quasi-experimental methods from the causal inference literature to identify the causal impact of changes in environmental goods on housing prices in a hedonic framework. Traditional hedonic analysis has employed fairly simple notions of location, space, and neighbors. For example, it has usually used measures of environmental quality onsite (e.g., air pollution levels) or simple distance to an environmental amenity or disamenity (e.g., open space, hazardous waste site). However, such simple definitions may fail to capture important effects. For example, the walking or driving time to a park might affect the price of a house more than the Euclidian distance, and having an amenity across a major road might increase the perceived distance of that amenity more than having it across a minor street. While the value of water quality improvements in a lake is diminishing with the distance of a house from a lake, there may be a discontinuous jump in value at the waterfront; there is often a complex story to be told about the actual ecosystem services that are being valued through the proxy of pollution measures (recreation, visual aesthetics, ecological health) and the role that space plays in mediating people’s experiences of those services. Furthermore, when estimating how house prices might affect each other, such as when estimating a spatial lag, houses on the same block might affect each other’s values more than houses one block over even if they are the same distance apart. Such concerns can be addressed by taking a broader spatial view of the ways in which environmental quality might affect the relative desirability of homes in a housing market, and by taking care to define variables in hedonic models to reflect spatial realities and processes on the ground. The effects of pollution may not be simple – neither uniform, nor merely a matter of being in a polygon that is contiguous with a source, nor a linear function of distance from a source. In such cases, one can use detailed information on the dispersion of the effects of pollution to inform a hedonic analysis that estimates people’s willingness to pay to reduce it. The hedonic spatial model can also be enriched by enhancing the interaction between space and time, extending the standard hedonic model to allow households to be forward looking and to face transaction costs of moving (Bishop and Murphy 2011). Under such plausible circumstances, households weigh the cost of an environmental amenity (captured by the price premium associated with houses in locations with good environmental values) against the present discounted value of the stream of future utility they will obtain from the amenity. Incorporating forwardlooking behavior yields much bigger estimates of consumer marginal willingness to pay for a spatially heterogeneous environmental amenity.

3.2

Travel-Cost Analysis

The other nonmarket valuation that is most intrinsically spatial is the travel cost approach to estimating the values people place on the quality of natural resources (Champ et al. 2017). This method estimates demand for recreational sites such as

Spatial Environmental and Natural Resource Economics

1423

beaches, lakes, and forests as a function of features of those natural sites; the results yield estimates of the values of the features (e.g., water quality, species populations) included in the analysis. The travel-cost approach uses data on how often people visit the sites of interest and how much those visits cost each individual in the data set, where travel cost depends in part on how close someone lives to a site. Single-site models use econometric analysis to estimate how quantity of visits to a site depend on environmental quality; multiple-site models use a random-utility model (RUM) econometric approach to estimate how the choice of which of several sites to visit depends on the attributes of all the sites and how much travel to them costs. Travel-cost valuation methodology has evolved to include many features of space. Researchers consider the possibility of multiple modes of travel with different time and out-of-pocket expenditures. The cost of travel is traditionally measured as a function of how far a person lives from a site, but if people engage in locational sorting, distance from a site (and hence measured travel cost) will be correlated with unobservable preference heterogeneity, creating biased coefficient estimates. Latent class models can be used to control for this endogeneity. Other problems can arise from the presence of multiple sites between which people choose for recreation (e.g., patches of a forest for hunting, lakes in a chain for fishing). Choices between substitute sites can be modeled with a nested logit model that allows some sites to be better substitutes than others. But if sites are connected physically and ecologically across space (for example, a change in water quality at one lake causes fish populations to change and redistribute through a chain of lakes) then conventional travel-cost analysis can yield misleading information about the welfare effects of that change. A structural model of recreation site choice and harvest intensity must be coupled with a spatial model of population dynamics to understand the welfare effects of making improvements to features of one or more sites in such a system.

3.3

Stated Preference Valuation Techniques

Stated preference valuation methodologies (contingent valuation and choice experiment studies) use information from hypothetical survey questions to estimate consumers’ willingness to pay for environmental goods and services even if the values they gain are not based in any way on direct use (Champ et al. 2017). Nonuse values may not be affected by distance to environmental amenities. However, distance may play a factor in the values people place on the environment if people have a localized “sense of place” or if use values comprise a large fraction of the total value people place on environmental public goods (Glenk et al. 2019). Thus, space is recognized now to be an important part of even stated-preference valuation approaches. The value people place on an environmental improvement may depend on spatial variation in the current quality they experience for the amenity in question on the density of substitutes for a good that are present in the area. Data on how far people are from the amenity to be valued can be included directly in the specifications of stated preference studies to measure whether there is distance decay; distance may affect the values people place on environmental goods

1424

A. W. Ando and K. Baylis

and ascertain how that effect varies with income. Including distance explicitly in individual willingness-to-pay functions helps cost-benefit analysts avoid making arbitrary choices about the spatial extent of the population of people that are affected by a project.

4

Spatial Empirical Identification Strategies

As noted in Sect. 3.1 on hedonic analysis, space or location has long been used as a source of information to identify and estimate the effects of variation in environmental quality. The explosion in spatial data, whether from remote sensing or geocoded household surveys has facilitated a dramatic increase in empirical papers using variation over space as a way to identify the effect of interventions or of pollution itself. Many empirical papers use difference-in-difference approaches that compare treatment and control observations before and after the introduction of the treatment. This approach controls for time-invariant differences between the treated and control observations. Regression discontinuity design makes use of a fixed threshold that determines whether an observation is “treated” or not. Matching and more recently, synthetic controls, are commonly used to ensure control and treatment units are similar over observables, but are vulnerable to differences in unobservable factors. Often these approaches are used in combination with differences in differences, including studies that make creative use of remotely sensed or administrative data to build baselines and test for parallel trends. Some recent methods to address these potential biases are doubly-robust regressions (Belloni et al. 2014). Many shocks have been used for identification in spatial environmental and natural resource economics, such as the introduction of conservation programs or a localized plant closure (e.g. Hanna and Oliva 2015). Another common approach is to use weather shocks to either estimate their effect, or to use them as a source of variation in pollution, for example, rainfall increasing runoff or temperature inversions or wind affecting air pollution. Along with measuring the effect of policies on intended outcomes, the use of this quasi-experimental technique has been applied to estimate nonmarket valuation of environmental amenities and health outcomes. By definition, one requirement of the quasi-experimental approach is that when using variation across space as a source of variation, one needs a spatially varied shock. For example, spatially heterogeneous policies, such as air-pollution emission standards that vary non-attainment status of a country, have become popular sources of identification to estimate willingness to pay for pollution or the influence of pollution on health or economic activity. Often the treatment effect itself may vary over space, and this treatment heterogeneity may be of interest. For example, one might be particularly concerned how well protected area status fares in those locations with high resource pressure, or be interested how an air quality improvement program affects vulnerable neighborhoods. Along with earlier methods to estimate treatment heterogeneity, such as quantile approaches or regional interactions, several new methods have emerged to

Spatial Environmental and Natural Resource Economics

1425

facilitate estimates of treatment heterogeneity. For example, locally or geographically-weighted regressions (chapter ▶ “Geographically Weighted Regression”) allow coefficients to vary over space. One concern is that to compare estimates across space, one needs to be assured that the differences in estimates is not driven by differences in estimate bias. Other concerns arise that the analyst might be tempted to cherry-pick dimensions of heterogeneity to generate statistically significant results. Along with clearly pre-specifying dimensions of heterogeneity, a novel approach is to instead pre-identify a machine learning algorithm to allow variation in the data to choose the dimensions of heterogeneity. While it has some advantages, the quasi-experimental methodology has limitations as well. One challenge is in choosing the appropriate spatial scale (chapter ▶ “Scale, Aggregation, and the Modifiable Areal Unit Problem”) for analysis. Often researchers cannot observe responses at the individual level and use regional values instead. At least two problems arise from this. First, patterns of correlation among variables across space are not always robust to the spatial units over which the data are aggregated. This problem of ecological fallacy (Anselin 2002) is most pronounced if individual variation within a region is large compared to the variation among regions. Second, non-parcel level data may not be fine enough to observe the effects of some environmental shocks, or can be so fine that they pick up uninformative variation, leading to potential biases from errors in variables (Avelino et al. 2016). Quasi-experimental studies may also yield biased results if they assume treatment effects that are constant with distance when, in fact, both the treatment itself and the impact of a treatment on housing prices are idiosyncratic across space. Researchers have responded to the limitations of other empirical methods with a large increase in the use of field experiments in resource and environmental economics. A number of recent interventions have used randomized controlled trials (RCTs) to estimate the effect of conservation interventions (Jayachandran et al. 2017). The use of RCTs in environment and resource economics can be challenging due to the scale required to avoid spatial spillovers or capture spatial heterogeneity, but their application to environmental and resource economics holds great potential.

4.1

Environment and Health

Arguably the largest growth area in the use of these natural or quasi-experiments in environmental economics has been on measuring the effect of pollution on health. As with the willingness-to-pay literature, this topic that has seen the broad application of hedonic (chapter ▶ “The Hedonic Method for Valuing Environmental Policies and Quality”) analysis. A substantial literature measures the costs of environmental health risks and disease that use epidemiological methods to estimate a dose-response function of, say, exposure to a chemical and health outcomes, and then use wage hedonics to estimate the perceived costs of those work-related risks (Champ et al. 2017). Recently, a burgeoning literature has shown that environmental degradation affects cognitive performance and labor productivity as well (GraffZivin and Neidell 2013).

1426

A. W. Ando and K. Baylis

Various authors have used natural experiments arising from a temporary plant closure or changes in traffic patterns to estimate the effect of emissions on health outcomes. Other authors have used economic downturns, or regional energy policies as an instrument for changes in county-level pollution to estimate the effect of pollution on health. As with the other quasi-experimental studies, one concern is finding the appropriate scale of analysis. More recent papers make use of smaller scale variation in pollution levels, using within-zip-code or school district variation to be better able to control for other neighborhood fixed effects. Another approach is to use natural and environmental disasters as a source of variation to estimate the effect of these disasters on health outcomes. One challenge is that the time and location of emissions can vary greatly with the time and location of contamination. For example, consider mercury, where airborne pollution emitted in China that affects contamination in North America, and soil and water contamination can last for many years. Random shocks in pollution transport, e.g. winddriven smoke plumes from fires, are increasingly used as a source of exogenous variation in contamination. A second challenge is that even with a good measure of contamination, avoidance behavior will affect how contamination relates to exposure. Both short-run avoidance behavior, such as staying indoors or using filters, or long-run sorting such as migration, can lead the exposed population to be systematically different than the non-exposed population. Understanding avoidance behavior has itself become an important topic in environmental economics. In using spatial variation in treatment, researchers also need to be careful to rule out potential spillovers resulting from the treatment into neighboring control regions; such spillovers could render the control group uncontrolled, and therefore bias the estimate of treatment effect.

4.2

Evaluations of Protected Areas and Payment for Environmental Services Programs

Spatial analysis has been and can be used to estimate the effectiveness of conservation measures in preventing environmental degradation such as deforestation. The methodology has been developed to study programs that establish protected area policies that offer payments to landowners for activities that preserve or increase flows of environmental services – “payment for environmental services” (or PES) programs. Location-specific attributes and the spatial process of land use play important roles in estimating the effects of these programs (Cheng et al. 2019). Early evaluations of conservation efforts compared outcomes (such as deforestation rates) in areas subject to a conservation measure, such as legal protection, to outcomes in plots outside the boundaries of this protection. The problem with this approach is that protected and unprotected areas frequently differ in ways that systematically bias the comparisons. For example, countries may naturally place their protected areas in regions that face lower deforestation pressure. In these circumstances, estimates from a simple comparison of outcomes inside and outside of the protected area boundaries would overstate the impact of conservation policies.

Spatial Environmental and Natural Resource Economics

1427

To overcome these biases and develop more accurate comparisons, conservation research must consider realistic counterfactual scenarios. Thus, researchers must adopt evaluation techniques that permit comparison of observed outcomes with what would have happened in the absence of a conservation effort. The difficulty lies in that counterfactuals cannot be observed directly and need to be carefully estimated (Joppa and Pfaff 2010). Many of the evaluations of protected areas and payments for ecosystem services use causal inference tools such as matching, or more recently, matching combined with differences in differences, usually to compare environmental outcome proxies (such as forest cover) or household welfare in treated areas to their matched controls. Some of these approaches take an explicitly spatial approach to the analysis of conservation program effectiveness. One can control for spatially autocorrelated errors or control for spatial spillovers from one observation to the next, by explicitly estimating the spatial lag associated with land use change. Failure to control for such spillovers can have large effects on the estimates of treatment effects. Another strategy is to estimate the effect of the program on nearby areas, or explicitly estimate the leakage caused by the policy (Alix-Garcia et al. 2012). If there is a spatial lag process associated with deforestation, land use in observations on the boundary of the treatment area might well be affected by the treatment of the neighboring area, implying that they are not appropriate control observations. Recent work includes the use of randomized controlled trials to estimate the effect of payments for ecosystem services (Jayachandran et al. 2017).

5

Models of Behavior in Space

Until now, this chapter has largely focused on models where spatial effects arise from features of nature. Such models assume that resource locations are given and that the heterogeneous effects of pollution are determined by factors exogenous to humans, like wind or hydrology. However, spatial heterogeneity may arise from human behavior and the resulting economic forces. Research in environmental and natural resource economics has developed understanding of various spatial dimensions of human behavior. From the simplest von Thünen (chapter ▶ “Classical Contributions: Von Thünen and Weber”) model of land use being driven by transport costs to market to the rise of New Economic Geography in the 1990s, we now have models that predict the growth of cities. The New Economic Geography approach models population centers as arising from tension between agglomeration economies (driven by monopolistic competitive firms) and congestion costs. These models still assume at their base a featureless plain, where migration is driven by differences in real wages. Once one introduces an influential spatial feature, people with a strong preference for that amenity may migrate for other reasons. This innovation has led to the concept of spatial sorting. Economics predicts that people respond to incentives. Incentives may themselves arise from features of the landscape other than just proximity to the nearest urban

1428

A. W. Ando and K. Baylis

center. For example, zoning and other land-use rules may place restrictions on the use of some land, pushing these land uses elsewhere (an effect also known as leakage). As land is removed from potential development, the price of development rights may increase in other regions. These and other behavioral responses are incorporated into modern models of land use and land conservation research. Regulation of environment and natural resource use is complicated by the existence of multiple regulators and multiple regulated actors, giving rise to the potential for strategic behavior and collective action problems. These problems gain an extra dimension of complexity when the cooperation or competition occurs over space. Spatial environmental and natural resource economics now incorporates some of these multiagent behaviors in space, including agent based models (chapter ▶ “Cellular Automata and Agent-Based Models”) to model spatial interactions and uncover emerging spatial patterns resulting from land use or technology adoption (Rasch et al. 2017).

5.1

Spatial Sorting Models

One recent thread of research in the field of environmental and natural resource economics has rapidly become an established and influential feature of the literature: spatial sorting models (Kuminoff et al. 2013) (chapter ▶ “Housing Choice, Residential Mobility, and Hedonic Approaches”). This body of work evolved from early work by Tiebout on how people “vote with their feet” and move to places that have bundles of attributes – including environmental quality and cost – they prefer. Modern spatial sorting models are theoretically and computationally complex, and are used for a wide range of functions. One category of research on sorting models is positive – just seeking to describe whether (and if so, how) people sort across space in the face of spatial heterogeneity of attributes. This research can help us to understand the forces that drive demographic patterns within urban areas, and shed important light on questions of environmental justice. These models can also be used to explore how proposed changes in environmental quality will affect the distribution of people in the landscape and their subsequent wellbeing. Early theoretical models of spatial sorting equilibria assumed that households have heterogeneous incomes and preferences over housing and public good characteristics of a location. Communities vary in how expensive they are and in the level of the public good they provide. Individuals choose where to live to maximize their utilities subject to their budget constraints; housing prices in communities adjust until equilibrium is reached such that no household would prefer to live somewhere other than where they are living. Even in the simplest models, assumptions must be made about the structure of indirect utility functions in order to ensure that equilibrium exists. The models also assume (implicitly or explicitly) that all households have perfect information about community characteristics and the preferences of other households, all households are able to purchase as much housing as they want in their preferred locations, and moving is costless. The resulting equilibria have

Spatial Environmental and Natural Resource Economics

1429

communities that are stratified by income if preferences are homogeneous, and households sorted differentially according to the features they care most about if preferences vary. Later models allow for spillovers between individuals that choose a given location; spillovers can either be positive (as in the case of agglomeration economies) or negative (if there is congestion). Under these circumstances, multiple equilibria are often possible, particularly if there is a strong agglomeration effect. One can still use data to estimate the features of models that have multiple equilibria, but multiplicity makes it more difficult to draw conclusions about what the re-sorting effects will be of major changes in a region such as cleaning up a hazardous waste site. Empirical work has sought to identify whether sorting behavior in response to spatial environmental heterogeneity is an important factor in residential markets. Econometric approaches to this problem include statistical analysis of changes over time in socio-demographic and housing characteristics of locations near sites that experience changes in environmental quality. There is evidence that people locate at least partly in response to environmental features of neighborhoods, and that such dynamics can exacerbate income segregation and environmental justice problems in urban areas (Banzhaf et al. 2019). In a second type of research, the utility-theoretic underpinnings of sorting models have supported an approach for non-market valuation, estimating the values people place on elements of environmental quality that do not have market values. Researchers can use neighborhood-level land value data to obtain structural estimates of the parameters underpinning residential sorting models and thus estimate values of spatially differentiated environmental amenities such as air quality and open space (Klaiber and Phaneuf 2010). In addition to generating value estimates that can be used in cost-benefit analyses, this research reveals several insights about environmental policy and research. First, the benefits of an environmental improvement policy depend on how it is distributed in space. Second, benefit estimates based on traditional nonmarket-valuation techniques may be incorrect if the environmental changes to be valued are large enough to induce significant re-sorting. The structural sorting-equilibrium approach does have the advantage of taking dynamic factors into consideration. However, it requires analysts to impose structure on the underlying model and to make arbitrary choices about the boundaries over which communities (which are the unit of observation) are defined.

5.2

Behavior in Resource Use and Conservation

The study of land and resource use straddles several disciplines in economics (urban economics, environmental economics, and economic geography) and has long recognized the importance of human interaction with space. Early models of land use often ignored the behavioral component and were largely meant to fit, as opposed to explain, the data. The next models of resource use and optimal conservation planning improved economic efficiency by incorporating simple models of

1430

A. W. Ando and K. Baylis

human responses to land use changes. Conservation should be targeted at parcels that are under high threat of development. Policy makers should consider that, in general, restrictions on land use in one part of space (such as zoning) can intensify the limited activity in other areas that are not controlled by the restrictions and undermine net environmental improvements through a phenomenon called leakage or slippage (Alix-Garcia et al. 2012). More recent research incorporates the fact that multiple actors are involved in conservation. Endogenous fishery harvesting behavior affects the outcomes of spatially explicit harvesting regulations – if one area is closed, harvesters work more intensively in another area, and if regulations increase target populations, harvesting effort will increase. Similar dynamics occur on land; forest use restrictions near villages can displace forest extraction behavior into neighboring parts of the forest. Socially optimal spatial resource use regulations can be designed in ways that take such endogenous behavior into account (Ando et al. 2018). Human actors may even interact strategically. Spatial strategic behavior is best known in models of how local governments set their levels of public goods, taxes, and/or regulatory stringency. If firm location choice is endogenous, nearby jurisdictions may compete on the level of taxation and public goods. This competition is further complicated by economic activity induced by firm location having spillovers to neighboring locations. Strategic private responses can thwart governments in many of the actions they try to take to improve environmental quality directly, creating hold-up problems when an agent is trying to establish an agglomerated protected area that requires buy-in from multiple land owners, sometimes shifting private conservation into parts of a landscape that are spatially disparate from the locations of public conservation activities. Strategic thinking can also be important for conservation reserve design. Optimal reserve choices by one agent should take into account the likely responses of other agents and likely changes in the land market which affect the risk to other parcels of conversion and the cost to the decision maker of future conservation (Lawley and Yang 2015). Such strategic decision making can yield improved conservation outcomes, but can entail making seemingly counterintuitive choices such as avoiding putting protected areas in some locations with high ecological value.

6

Conclusions

Some areas for future work in spatial environmental and natural resource economics seem to be important and promising. In particular, the increased availability of spatial data, either from remote sensing or geocoded administrative data hold promise for future analysis of pollution, resource use and welfare (Donaldson and Storeygard 2016). One area of future work is to incorporate the spatial attribute of these data into the quasi-experimental setting. For example, weather data are usually krieged, implying potential errors across space, and novel spatial proxy measures, such as using nightlights for economic development, build in systematic errors that

Spatial Environmental and Natural Resource Economics

1431

need to be considered. Relatedly, with the increased granularity of data available, understanding the appropriate spatial scale for analysis has once again become important. Particularly in the presence of spatial correlation, the choice of scale of analysis can lead not only to attenuation bias, but also to potential overestimates of the effect of treatment (Avelino et al. 2016). A related area of concern is that a treatment itself may actually change the scale and scope of important spatial processes related to that treatment. For example, a fuel tax may affect the degree of spatial spillover from economic activity in one area to economic activity in neighboring areas by changing patterns of commuting behavior. These effects on spillovers may be substantial and have large effects on policy outcomes which have not systematically been studied. In the area of spatial policy design, truly optimal policies need to take spatial strategic reactions into account rather than treating other actors as merely reactive. Papers that apply game theory (chapter ▶ “Game Theoretic Modeling in Environmental and Resource Economics”) to spatial policy decisions are rare; more work needs to be done in this area. For example, private actors are known anecdotally to buy land for speculation if they anticipate conservation agents wanting to buy it for protected areas. This phenomenon is different from that of markets responding to conservation with increased prices nearby and should be worked into spatialdynamic models of optimal reserve-site selection. Future work in spatial environmental and natural resource economics may even move to redefine what we mean by “space.” Extant research and knowledge in this field conceptualizes space in traditional geographic terms. However, other dimensions of space (chapter ▶ “Using Convex Combinations of Spatial Weights in Spatial Autoregressive Models”) exist that may affect natural processes and human behavior. Economic interactions may facilitate technological adoption more than mere geographic proximity. Social distance and social networks can affect attitudes and behavior through facilitating both information flow and influence. As an example, information and influence can affect individual’s valuation of a disamenity such as hazardous waste. Further, social influence can be used to improve monitoring, enforcement, and, therefore, management of a local common pool resource, such as a community pasture. Current research in spatial econometrics is moving forward to allow researchers to estimate spatial weights or spatial spillover patterns, as opposed to merely estimating the degree of spillover given an assumed structure of the extent to which different spatial units function as neighbors. These advances in spatial econometrics (See Section XI on “Spatial Econometrics”) will facilitate future research that quantifies the effects of spillovers in environmental and natural resource economics. Last, increases in computational power along with machine learning methods hold particular promise for spatial environment and resource research. Environmental processes are highly non-linear, and can vary greatly over space. The flexibility afforded by machine learning methods can capture these non-linearities without the usual need for the analyst to impose strict, and often arbitrary, functional forms. Further, these methods allow for detailed estimates of heterogeneity, which allow analysists to consider the important distributional effects of interventions.

1432

A. W. Ando and K. Baylis

Knowledge in spatial environmental and natural resource economics already includes theoretical and empirical models that inform spatial environmental policy design, evaluate policy effectiveness, help us predict human behavior in a landscape, and help place values on environmental goods that are spatially heterogeneous and convey benefits in ways that vary with spatial processes. However, work in this field of research is still very much ongoing and the field is still evolving; much more needs to be done.

7

Cross-References

▶ Cellular Automata and Agent-Based Models ▶ Classical Contributions: Von Thünen and Weber ▶ Cross-Section Spatial Regression Models ▶ Dynamic and Stochastic Analysis of Environmental and Natural Resources ▶ Game Theoretic Modeling in Environmental and Resource Economics ▶ Geographically Weighted Regression ▶ Housing Choice, Residential Mobility, and Hedonic Approaches ▶ Interpreting Spatial Econometric Models ▶ Methods of Environmental Valuation ▶ Scale, Aggregation, and the Modifiable Areal Unit Problem ▶ The Hedonic Method for Valuing Environmental Policies and Quality ▶ Using Convex Combinations of Spatial Weights in Spatial Autoregressive Models Acknowledgments This chapter is based in part on work supported by USDA-NIFA Hatch project number #ILLU-470-316. Lead authorship is equally shared by the two coauthors. The authors thank the editors of this volume for useful comments on the manuscript.

References Albers HJ, Robinson EJZ (2013) A review of the spatial economics of non-timber forest product extraction: implications for policy. Ecol Econ 92:87–95 Alix-Garcia JM, Shapiro EN, Sims KRE (2012) Forest conservation and slippage: evidence from Mexico’s national payments for ecosystem services program. Land Econ 88(4):613–638 Ando A, Howlader A, Mallory M (2018) Diversifying to reduce conservation outcome uncertainty in multiple environmental objectives. J Agric Resour Econ 47(2):220–238 Anselin L (2002) Under the hood: issues in the specification and interpretation of spatial regression models. Agric Econ 27(3):247–267 Avelino A, Baylis K, Honey-Rosés J (2016) Goldilocks and the raster grid: choosing a unit of analysis for evaluating conservation programs. PLoS One 12(11):e0167945 Banzhaf S, Ma L, Timmins C (2019) Environmental justice: the economics of race, place, and pollution. J Econ Perspect 33(1):185–208 Belloni A, Chernozhukov V, Hansen C (2014) High-dimensional methods and inference on structural and treatment effects. J Econ Perspect 28(2):29–50 Bishop KC, Murphy AD (2011) Estimating the willingness to pay to avoid violent crime: a dynamic approach. Am Econ Rev Pap Proc 101(3):625–629

Spatial Environmental and Natural Resource Economics

1433

Champ PA, Boyle KJ, Brown TC (eds) (2017) A primer on nonmarket valuation, The economics of non-market goods and resources, vol 13. Springer, Dordrecht Chen Z, Kahn ME, Liu Y, Wang Z (2018) The consequences of spatially differentiated water pollution regulation in China. J Environ Econ Manag 88:468–485 Cheng SH, Ahlroth S, Onder S, Shyamsundar P, Garside R, Kristjanson P, Mckinnon MC, Miller DC (2019) A systematic map of evidence on the contribution of forests to poverty alleviation. Environ Evid 8(3):1–22 Donaldson D, Storeygard A (2016) The view from above: applications of satellite data in economics. J Econ Perspect 30(4):171–198 Glenk K, Johnston RJ, Meyerhoff J, Sagebiel J (2019) Spatial dimensions of stated preference valuation in environmental and resource economics: methods, trends and challenges. Environ Resour Econ. https://doi.org/10.1007/s10640-018-00311-w Graff-Zivin J, Neidell M (2013) Environment, health and human capital. J Econ Lit 51(3):689–730 Hanna R, Oliva P (2015) The effect of pollution on labor supply: evidence from a natural experiment in Mexico City. J Public Econ 122:68–79 Jayachandran S, de Laat J, Lambin EF, Stanton CY, Audy R, Thomas NE (2017) Cash for carbon: a randomized trial of payments for ecosystem services to reduce deforestation. Science 357(6348):267–273 Joppa L, Pfaff A (2010) Reassessing the forest impacts of protection: the challenge of nonrandom location and a corrective method. Ann N Y Acad Sci 1185:135–149 Kaiser BA, Burnett KM (2010) Spatial economic analysis of early detection and rapid response strategies for an invasive species. Resour Energy Econ 32(4):566–585 Klaiber HA, Phaneuf DJ (2010) Valuing open space in a residential sorting model of the Twin Cities. J Environ Econ Manag 60(2):57–77 Kroetz K, Sanchirico JN (2015) The bioeconomics of spatial-dynamic systems in natural resource management. Ann Rev Resour Econ 7(1):189–207 Kuminoff NV, Smith VK, Timmins C (2013) The new economics of equilibrium sorting and policy evaluation using housing markets. J Econ Lit 51(4):1007–1062 Kuwayama Y, Brozović N (2013) The regulation of a spatially heterogeneous externality: tradable groundwater permits to protect streams. J Environ Econ Manag 66(2):364–382 Lawley C, Yang W (2015) Spatial interactions in habitat conservation: evidence from prairie pothole easements. J Environ Econ Manag 71:71–89 Newbold SC, Siikamäki J (2015) Conservation prioritization using reserve site selection methods, chapter 13. In: Halvorsen R, Layton DF (eds) Handbook on the economics of natural resources. Edward Elgar, Northampton, pp 358–398 Parkhurst GM, Shogren JF, Crocker T (2016) Tradable set-aside requirements (TSARs): conserving spatially dependent environmental amenities. Environ Resour Econ 63(4):719–744 Polasky S, Lewis DJ, Plantinga AJ, Nelson E (2014) Implementing the optimal provision of ecosystem services. Proc Natl Acad Sci 111(17):6248–6253 Rasch S, Heckelei T, Storm H, Oomen R, Naumann C (2017) Multi-scale resilience of a communal rangeland system in South Africa. Ecol Econ 131:129–138 Zivin JSG, Kotchen MJ, Mansur ET (2014) Spatial and temporal heterogeneity of marginal emissions: implications for electric cars and other electricity-shifting policies. J Econ Behav Organ 107:248–268

Further Reading Bateman I, Yang W, Boxall P (2006) Geographical information systems (GIS) and spatial analysis in resource and environmental economics. In: Tietenberg T, Folmer H (eds) The international yearbook of environmental and resource economics 2006/2007: a survey of current issues. Edward Elgar, Northampton, pp 43–92 Cropper ML, Oates WE (1992) Environmental economics: a survey. J Econ Lit 30(2):675–740

1434

A. W. Ando and K. Baylis

Fischer MM, Getis A (eds) (2010) Handbook of applied spatial analysis: software tools, methods, and applications. Springer, Berlin/Heidelberg Geoghegan J, Gray W (2005) Spatial environmental policy. In: Tietenberg T, Folmer H (eds) The international yearbook of environmental and resource economics 2005/2006: a survey of current issues. Edward Elgar, Northampton, pp 52–96 Irwin EG, Bell KP, Bockstael NE, Newburn DA, Partridge MD, Wu J (2009) The economics of urban-rural space. Ann Rev Resour Econ 1(1):435–459 Khandker SJ, Koolwal GB, Samand HA (2010) Handbook on impact evaluation: quantitative methods and practices. World Bank, Washington, DC Polasky S (2005) Strategies to conserve biodiversity. In: Tietenberg T, Folmer H (eds) The international yearbook of environmental and resource economics 2005/2006: a survey of current issues. Edward Elgar, Northampton, pp 157–184 Viscusi K, Gayer T (2005) Quantifying and valuing environmental health risks, chapter 20. In: Mäler KG, Vincent J (eds) Handbook of environmental economics, vol 2. Elsevier, Amsterdam, pp 1029–1103

Dynamic and Stochastic Analysis of Environmental and Natural Resources Yacov Tsur and Amos Zemel

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 The Canonical Resource Management Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Resource Management Under Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Uncertain T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Stochastic Stock Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Discounting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Instantaneous Benefit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Post-Planning Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Compound Uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Integrating Natural Resources and Aggregate Growth Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 An Integrated Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Uncertainty in the Integrated Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Irreversibility and Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Knightian Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1436 1437 1439 1439 1444 1445 1446 1446 1447 1448 1449 1450 1450 1451 1452 1452

Abstract

Uncertainty affects the dynamic trade-offs of environmental and natural resource management in a variety of ways and forms. The uncertain responses to anthropogenic activities may be due to genuine stochastic processes that drive the Y. Tsur (*) Department of Environmental Economics and Management, The Hebrew University of Jerusalem, Rehovot, Israel e-mail: [email protected] A. Zemel Department of Solar Energy and Environmental Physics, Jacob Blaustein Institutes for Desert Research, Ben Gurion University of the Negev, Beersheba, Israel e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_51

1435

1436

Y. Tsur and A. Zemel

evolution of the underlying natural systems or simply due to our poor understanding of these complex systems and their interactions with the exploitation policies. These interactions are of particular importance when the ecosystem response might involve irreversibility, so that unexpected undesirable outcomes cannot be undone after they are realized. In this chapter, we review the various sources of uncertainty, the methodologies developed to account for them, and the implications regarding the management of environmental and natural resources. Keywords

Discount rate · Optimal policy · Regime shift · Steady state distribution · Nonrenewable resource

1

Introduction

Environmental and resource economics is the branch of economics in which human activities interact with natural processes, giving rise to complex dynamical systems. Since the natural processes that constrain the options open to resource managers evolve in ways that are often poorly understood, the responsible management of natural resources must account for the dynamical and uncertainty aspects of the combined human-natural systems. These two aspects make the central theme of this chapter. The importance of uncertainty considerations in the design of environmental policies has long been recognized, and the literature dealing with this topic is vast (see Mangel (1985) and the recent reviews of Heal and Kriström (2002) and Pindyck (2007)). In this chapter, we consider this issue emphasizing the rich variety of forms in which uncertainty enters all components of the management problems. Uncertainty stems from two main sources: (a) our own limitations in understanding key natural and economic parameters and (b) genuine stochastic elements that govern the evolution of the systems under consideration. It can show up as unpredictable disturbances to the evolution of an ecosystem, either in the form of abrupt discrete occurrences (“catastrophic events”) or as an ongoing stream of small stochastic shocks which drive diffusion processes that need to be controlled. Obviously, the diversity of uncertainty sources and types calls for a variety of methods to model and handle them as well as for various (often conflicting) policy measures to respond to their influence on the systems to be managed. Here we review various methods and approaches that have been considered in the literature for dealing with uncertainty in the context of natural resource management. We begin with a schematic (“canonical”) resource management model Sect. 2 and proceed to show how the various types of uncertainty enter each of its elements Sect. 3. In actual practice, resource managers may face more than a single type of uncertainty at the same time. We point out that the interaction between the various types can give rise to new complex effects. In a more general setup, the management problem cannot be restricted to the resource sector but must be considered in a wider context, with various

Dynamic and Stochastic Analysis of Environmental and Natural Resources

1437

economy-wide variables both affecting and being affected by the environmental and natural resource sectors. To account for such considerations, we describe a framework that integrates natural resources and aggregate economic growth and use it to discuss additional effects of uncertainty Sect. 4. In Sect. 5 we direct attention to the concept of irreversibility characterizing many resource management situations. Irreversible outcomes are particularly relevant when coupled with uncertainty, because they can otherwise be anticipated and avoided when so desired. Finally, we discuss briefly the case of Knightian uncertainty Sect. 6 under which the underlying structure of uncertainty (e.g., the specification of the underlying distribution) is incompletely known.

2

The Canonical Resource Management Model

In a typical resource management situation, an initial resource stock Q0 is to be exploited over some planning horizon t  [0, T], t being the running time index and T is the end of the planning period which may or may not be predetermined. At any instant of time, the remaining stock Q(t) is given, and the exploitation rate q(t) generates the instantaneous benefit u(Q(t), q(t), t) and changes Q(t) according to Q_ ðt Þ  dQðt Þ=dt ¼ GðQðt Þ, qðt Þ,t Þ:

(1)

A simple example of a stock dynamic process is obtained from the specification G() = R(Q)  q, where R() represents natural recharge (growth, replenishment). For nonrenewable resources, for example, minerals, R vanishes at all times and G =  q. An exploitation policy {T, q(t), t  [0, T]} generates the payoff ðT

uðQðt Þ, qðt Þ,t Þeρt dt þ eρT vðQðT ÞÞ

(2)

0

where ρ is the time rate of discount and v() is the post-planning value (the present value at time T of the benefit stream over the post-planning period t > T). The policy is feasible if it satisfies some given constraints on T and on {Q(t), q(t), t  [0, T]}, for example, T is given or restricted to a certain range, the stock Q(t) is positive or bounded in some range, and q(t)  0 for all t  [0, T]. We denote by Γ the set of all feasible policies. The optimal policy is the feasible policy that maximizes Eq. (2) subject to Eq. (1) given Q(0) = Q0. The value of Eq. (2) obtained under the optimal policy is denoted V(Q0; Γ) and is called the value function. For brevity, the argument Γ is often dropped, leaving the initial resource stock as the sole argument of the value function. The formulation of the resource management problem in this way started with Hotelling (1931) who considered exhaustible (nonrenewable) resources and characterized optimal extraction policies in different market settings, using the calculus of

1438

Y. Tsur and A. Zemel

variations to verify economic reasoning. The development of optimal control and dynamic programming methods opened the way for a wide range of extensions, including the incorporation of uncertainty of various kinds and forms. In real-world situations, uncertainty is likely to be present in each of the components of the resource management problem: the planning horizon T, the instantaneous benefit u(. , . , .), the discount rate ρ, the post-planning value v(. , .), the recharge process R(. , .), the initial reserve Q0, as well as the specification of the feasibility constraints. In this chapter, we survey different approaches to deal with uncertainties often encountered in resource management problems. Before delving into extensions involving uncertainty, it is expedient to summarize the salient properties of the optimal policy of the canonical management problem formulated above. Suppose that at some time t the resource owner is offered the opportunity to increase the remaining stock Q(t) by a marginal unit. What is the maximal amount the owner will be willing to pay (at time t) to realize this opportunity? The answer, obviously, is the contribution of the added stock to the resource value at time t, that is, V 0(Q(t))  @V(Q)/@Q|Q = Q(t). Let λ(t) represent this opportunity cost at time t when the remaining stock is Q(t). The variable λ(.) comes under various names, including costate, shadow price, scarcity or royalty rent, and in situ value. By definition, it embodies the economic implications of stock changes, such as increasing extraction costs as the resource dwindles and the price of scarcity when a nonrenewable resource is nearing depletion. Exploitation at the rate q(.) bears two effects. First, it provides the instantaneous gratification u(.). Second, it changes the available stock via Eq. 1, hence the potential to enjoy future gratifications. The (current value) Hamiltonian, H ðQ, q, λ, t Þ  uðQ, q, t Þ þ λGðQ, q, t Þ balances these two effects such that the optimal exploitation rate maximizes it at each point of time. The economic interpretation of this “maximum principle” is readily seen under the specification G(Q, q, t) = R(Q)  q and when the maximization admits an internal solution, in which case the optimal rate q satisfies @u/@q = λ: Along the optimal path, the marginal benefit from exploitation should equal the shadow price of the resource, that is, the marginal cost of exploitation. Once the λ(.) process is given, the Hamiltonian maximization determines the optimal exploitation rate and, via Eq. (1), the ensuing stock process for the entire planning period t  [0, T]. Solving the management problem, then, requires the determination of the shadow price process, for which optimal control and dynamic programming are two approaches. In many cases, the optimal stock process Q(.) approaches a steady state (perhaps only asymptotically when T = 1), where exploitation and natural recharge just balance each other out. This is the case, for example, in infinite horizon, autonomous problems (where the time argument enters explicitly only via discounting) involving a single stock. In such problems, it has been shown that the optimal stock process is monotonic, hence (when bounded) must eventually converge to a steady state. Deriving the steady state is relatively easy even for problems that do not admit

Dynamic and Stochastic Analysis of Environmental and Natural Resources

1439

analytic solutions for the full dynamic evolution. Comparing the steady states under different conditions (model specifications, parameter values) provides a simple way to study the effects of changes in the underlying conditions on the optimal policy. The canonical resource management problem has been studied extensively, and the relevant literature is vast. For detailed treatments, we refer to Clark (1976) and Dasgupta and Heal (1979) who discussed resource management in a variety of situations, emphasizing renewable and nonrenewable resources, respectively.

3

Resource Management Under Uncertainty

As mentioned above, uncertainty abounds in resource management situations. It is important to distinguish at the outset between two types of uncertainty, depending on its origin. The first type is due to the participants’ (resource owners, users, regulators, etc.) limited knowledge of certain parameters or functional relations characterizing the resource and the economic systems under consideration. The second type is due to genuine random elements often encountered when dealing with Mother Nature. We refer to the former type as ignorance uncertainty and to the latter as exogenous uncertainty. For example, the recharge or instantaneous benefit may undergo an abrupt shift when the stock process crosses some threshold, but the exact location of this threshold is a priori unknown. There is nothing inherently random in the threshold parameter, except that it is unknown to the resource manager; hence, the uncertainty is due to ignorance. If, however, the abrupt regime shift depends also on exogenous environmental factors such as weather variables affecting the outburst of a pollution-induced disease, then its occurrence is triggered by the confluence of environmental conditions which are genuinely stochastic, and the uncertainty regarding the abrupt shift is exogenous. How to handle a particular source of uncertainty depends to a large extent on its type. We proceed now to discuss the incorporation of uncertainty, considering in turn each component of the above canonical resource management model.

3.1

Uncertain T

Some resource management problems do not admit a natural completion time, in which case the planning horizon becomes infinite (T = 1). In other cases, extraction must cease at a finite date T, while the considerations related to later periods are summarized in the post-planning value v(Q(T )). For example, mine developers may be permitted to extract the mineral only until some given date T when their concession expires. Moreover, the depletion of nonrenewable resources (or of renewable resources like fisheries that can be exploited to extinction) marks the end of the planning horizon, which depends on the extraction policy. In these cases, the planning horizon is either given exogenously or is a decision variable which can be determined for any extraction policy. In either case,

1440

Y. Tsur and A. Zemel

its incorporation within the management problem involves no uncertainty and poses no particular difficulty. In many situations, however, T is subject to uncertainty. A prominent example is that of an unknown initial stock – a situation studied initially by Kemp (1976). In such cases, T is a random variable whose realization marks the depletion of the resource, at which time management shifts to the post-planning period. A slight extension of the term “depletion” to include situations in which the resource can no longer be exploited or becomes obsolete allows to associate T with an uncertain date of nationalization (Long 1975) or with the uncertain arrival of a backstop substitute (Dasgupta and Heal 1974; Dasgupta and Stiglitz 1981). Cropper (1976) presented the problem in an environmental pollution context, identifying T with the random triggering of various environmental catastrophes. While the uncertainty in the cake-eating problem of Kemp (1976) is solely due to ignorance, the uncertainty in political (nationalization) or economical (technological breakthrough) events often involves genuine stochastic elements and is therefore exogenous. The distinction between the two types of uncertainty plays out most pronouncedly via the specification of the hazard rate function, measuring the probability density of the event occurrence (the realization of T) in the next time instant. In all of these variants, the management problem seeks to maximize the expected value of the objective Eq. (2) with respect to the distribution of T, and the latter closely depends on the type of uncertainty.

3.1.1 Ignorance Uncertainty A common ignorance-uncertainty situation involves a catastrophic event triggered by the stock falling below some unknown threshold. Examples, in addition to Kemp’s cake-eating problem, include seawater intrusion into coastal aquifers (Tsur and Zemel 1995) and global warming-induced catastrophes (Tsur and Zemel 1996; Nævdal 2006). The hazard rate in this case measures the probability of crossing the threshold during the next time instant. If the stock process does not decrease (e.g., extraction does not exceed the natural recharge) or if the stock process was in the past strictly lower than its current level, the hazard vanishes (it is certain that the threshold will not be crossed in the next time instant). In contrast, decreasing stock processes proceed under risk of occurrence. This feature complicates the formulation and solution of the management problem. The situation is greatly simplified if only monotonic stock processes are allowed. It turns out that in many cases of interest the optimal stock process is indeed monotonic. The characterization of the optimal monotonic stock process proceeds along the ^ c be the optimal steady state of the risk-free (canonical) problem. following steps. Let Q ^ c. Since it is not optimal to decrease the stock further Consider an initial stock Q0 < Q even without the risk of triggering a damaging event, it is obviously not optimal to do so under the event risk. The optimal process under occurrence threat, then, coincides ^ c. with the (increasing) risk-free process and approaches a steady state at Q ^ c . Then, the optimal stock process cannot increase. For if it Suppose that Q0 > Q increases, the monotonicity property implies that it will never decrease, in which case

Dynamic and Stochastic Analysis of Environmental and Natural Resources

1441

the hazard vanishes at all times and the problem reduces to that of the risk-free problem. But without the occurrence risk, the optimal stock process converges to ^ c – a contradiction. So when Q0 > Q ^ c , the optimal stock process is nonincreasing. Q Let X denote the unknown threshold stock with the probability distribution F(Q)  Pr{X  Q} and the corresponding density f(Q) = F 0(Q). For a decreasing stock process, the distribution F T ðt Þ  Pr fT  t g ¼ Pr fX  Qðt Þg ¼ 1  F ðQðt ÞÞ and the density f T ðt Þ ¼ F 0T ðt Þ ¼ f ðQðt ÞÞQ_ ðt Þ of the random occurrence time T determine the expected payoff (the expectation of Eq. (2) with respect to T ). This expected payoff defines the objective of a deterministic management problem, denoted the “auxiliary” problem, which also admits a ^ aux > Q ^ c. It turns monotonic optimal stock process that converges to a steady state Q out that the resource management problem under uncertain threshold splits into two ^ c, the distinct deterministic subproblems, depending on the initial stock: For Q0 < Q optimal stock process is the same as the increasing risk-free process, and the ^ aux , the optimal process coincides with occurrence risk can be ignored; for Q0 > Q the decreasing h c aux i auxiliary process, and the occurrence risk is relevant. If ^ ,Q ^ Q0  Q , the uncertainty process enters a steady state instantly (at the initial is ruled out by the above considerations. The state Q0) because anyh other policy i c aux ^ ^ steady state interval Q ,Q is a peculiar feature, unique to optimal behavior under ignorance uncertainty. Note the prudence implications of this characterization: Decreasing stock processes turn on the occurrence risk and hence approach a higher (and safer) steady state than that obtained without occurrence risk. Another interesting observation relates to the role of learning in this model. Decreasing stock processes provide new information regarding the threshold location as these processes proceed. This information, however, is already accounted for by the auxiliary objective, and the resource owners have no reason to update the original policy (designed at t = 0) as the information accumulates, unless the process is interrupted at some time by the catastrophic occurrence.

3.1.2 Exogenous Uncertainty Under exogenous uncertainty, the event is triggered by genuinely random conditions, and the probability of occurrence within the next time instant is measured by the hazard rate (Long 1975; Cropper 1976; Heal 1984). The hazard rate in this case depends neither on the history of the process nor on its trend (increasing or decreasing); hence, the splitting of the uncertainty problem into two distinct subproblems (that gave rise to the equilibrium interval under ignorance uncertainty) does not occur. The hazard rate can, however, depend on the current resource stock

1442

Y. Tsur and A. Zemel

and exploitation rate, which allows the owners to affect, even if not avoid completely, the risk of future occurrence by adjusting the extraction policy. This type of events has been assumed in a variety of resource models, including Deshmukh and Pliska (1985) who studied exploitation and exploration of nonrenewable resources, Reed and Heras (1992) in the context of biological resources vulnerable to a catastrophic collapse, Clarke and Reed (1994) and Tsur and Zemel (1998) in the context of pollution control, Cropper (1976) and Aronsson et al. (1998) who considered the risk of nuclear accidents, and Gjerde et al. (1999) and Bahn et al. (2008) in the context of climate policies under risk of environmental catastrophes. Given the stock process Q(.), the stock-dependent hazard process h(.) is related to the probability distribution and density of the event occurrence time, F(t) = Pr {T  t) and f(t) = F 0(t), according to hðQðt ÞÞΔ  PrfT  ðt, t þ ΔÞj T > t g ¼

f ðt Þ Δ: 1  F ðt Þ

Thus, h(Q(t)) =  d ln (1  F(t))/dt; hence, F ðt Þ ¼ 1  e



Ðt 0

hðQðsÞÞds

and f ðt Þ ¼ hðQðt ÞÞ½1  F ðt Þ:

The expectation (with respect to T ) of the objective Eq. (2) becomes ð1

½uðQðt Þ, qðt Þ,t Þ þ hðQðt ÞÞvðQðt ÞÞe



Ðt 0

½ρþhðQðτÞÞdτ

dt:

(3)

0

The optimal policy is the feasible policy that maximizes the objective Eq. (3) subject to Eq. (1) and Q(0) = Q0. In this way, the uncertainty problem is recast as a standard deterministic infinite horizon problem. Its optimal policy is relevant only as long as the event has not occurred. Once the event occurs, the optimal policy switches to that of the post-event problem (represented by the post-event value v). The event occurrence risk affects the resource management problem via the hazard rate, which enters the objective Eq. (3) both in the discount rate and in the instantaneous benefit (u + hv). The discount rate increases from ρ to ρ + h with two conflicting effects. First, the increased impatience (due to the higher discount rate) promotes aggressive exploitation (less conservation). Second, the discount rate ρ + h(Q) turns endogenous through its dependence on the stock. The possibility to control the discount rate via the extraction policy typically encourages conservation, and the trade-offs associated with the discounting effect are represented by the hazard rate of change h0(Q)/h(Q). The other effect of the occurrence threat on the management problem comes through the h(Q)v(Q) term, which is added to the instantaneous benefit in the objective Eq. (3). When this term depends on the stock, the resource owners can control the expected damage of the event by adjusting the extraction policy. The overall uncertainty effect results from balancing these conflicting trends. In a

Dynamic and Stochastic Analysis of Environmental and Natural Resources

1443

particularly simple example, the post-event value v(.) vanishes identically at all Q levels. This is the case, for example, when the event occurrence renders the resource obsolete with no further consequences or when it is possible to renormalize the instantaneous benefit in such a way that the post-event value vanishes (see, e.g., Tsur and Zemel 2009; Karp and Tsur 2011). In this case, only the discounting effects remain. When the hazard is independent of the stock, only the impatience effect is active, and the ensuing optimal policy entails more aggressive exploitation than its risk-free counterpart: If the world may come to an end tomorrow and there is nothing we can do about it, we may as well exploit the resource today while we can. However, if the hazard is sensitive to the resource stock, such that more exploitation increases the occurrence probability, then the endogeneity of the discount rate encourages conservation. Which of these effects dominates depends on h0(Q)/h(Q) (see discussion in de Zeeuw and Zemel 2012). A slightly more general formulation describes the post-event value v(.) in terms of a penalty inflicted upon occurrence. Tsur and Zemel (1998) distinguish between single occurrence and “recurrent” events. The latter entails multiple penalties inflicted each (random) time the event occurs. For penalty functions that decrease with the stock, both types of events imply more conservative exploitation vis-à-vis the risk-free policy. A prominent example of recurrent events is the case of forest fires which affect forest rotation management (see Reed 1984). Events that impact ecosystems often entail abrupt changes in the system dynamics. The post-event value in such cases is the outcome of the (risk-free) post-occurrence optimization problem proceeding under the new regime. When the change in dynamics implies a loss (e.g., via reduced natural replenishment of the resource), the extraction policy under uncertainty is more conservative than its risk-free counterpart (see Polasky et al. 2011 and references they cite). In fact, the discrete regime shift is in many cases a simplified description of the actual complex non-convex dynamical processes which give rise to fast transitions among locally stable basins of attraction and to hysteresis phenomena. However, when our interest is focused on the economic implications of the shift (rather than on the exact dynamics driving it), this simplification can yield interesting insights. Catastrophic events of global nature, such as those induced by global warming, are often exogenous to local decision units (countries, regions). In such cases, the occurrence hazard is taken parametrically by the decision maker. The damage inflicted by the event, however, may change across locations, with particular grave outcomes to some specific nations. A possible response by local governments to this state of affairs is to consider adaptation activities in order to reduce or eliminate the damage that will be inflicted by the event, should the mitigation efforts (via reduced exploitation) fail to avoid its occurrence. The adaptation activities entail some given costs, while the benefit (of reduced damage) will be enjoyed only following the (uncertain) occurrence date. The optimal adaptation policy should balance these costs and benefits (see de Zeeuw and Zemel 2012 and references therein). When the occurrence probability can be affected by mitigation policies, the two policy measures interact strongly and must be considered simultaneously to obtain optimal

1444

Y. Tsur and A. Zemel

outcomes. Indeed, the mere presence of the adaptation option can modify the extraction policy even prior to the actual implementation of this option. Our discussion has focused on unfavorable events such as environmental catastrophes. Favorable events, for example, technological breakthroughs, can be modeled in a similar way. Early studies of the uncertain arrival of a backstop substitute for nonrenewable resources with R&D efforts include Dasgupta et al. (1977), Kamien and Schwartz (1978) and Davison (1978). Bahn et al. (2008) considered such events in a renewable resource context of a climate policy that includes R&D efforts to develop clean energy technologies.

3.2

Stochastic Stock Dynamics

The dynamics of resource stocks is often driven by stochastic elements. Examples include biomass growth subject to random shocks, the replenishment of groundwater aquifers under uncertain precipitations, atmospheric pollution decay varying with changing weather conditions, and oil and mineral reserves subject to uncertain discoveries. The random shocks can come in the form of an ongoing stream of small fluctuations or as abrupt and substantial discrete occurrences. The latter show up, for example, when the resource evolution process undergoes a regime shift which entails the uncertain T scenario discussed above. Here we consider the continuous flow of small fluctuations giving rise to a diffusion (or random walk) process. As before, uncertainty regarding the stock evolution may be due to genuine random environmental shocks (Reed 1979; Pindyck 1984) or due to incomplete information. For example, the resource owners may be unable to measure the current stock precisely or to follow exactly the optimal extraction rule, leading to errors in predicting the next period’s stock (Clark and Kirkwood 1986). Reed (1974, 1979) considered a biomass stock (e.g., fish population) Qt following the discrete-time natural growth rule Qtþ1 ¼ Z t RðQt Þ where R(.) is the expected stock recruitment and Zt are independently and identically distributed unit-mean random variables representing stochastic shocks affecting the population growth in each reproduction season. The resource stock is revealed following the realization of Zt, yet the future evolution of the stock process cannot be predicted. In general, the concept of a steady state must be replaced by that of steady state distribution. However, if the realizations of the random shocks are observed before harvest decisions are made, the optimal policy maintains a constant escapement (postharvest biomass), that is, the optimal steady state distribution of escapement degenerates to a constant (Reed 1979). When additional sources of uncertainty (e.g., errors in the measurement of current stocks) are added, the constant escapement rule no longer holds (see Sect. 3.6). A similar stochastic growth rule has been used by Weitzman (2002) to compare fishery regulation via landing fees with (the more common) harvest quota. He found that the former measure is more

Dynamic and Stochastic Analysis of Environmental and Natural Resources

1445

effective in this case. Observe that stochastic dynamics is not restricted to the population growth of some biological stock but might be relevant also to nonrenewable mineral stocks as a result of dedicated ongoing exploration efforts for new reserves with uncertain outcomes (Mangel 1985; Deshmukh and Pliska 1985). Pindyck (1984) formulated the resource management problem under stochastic stock evolution in continuous time, employing Itô’s stochastic calculus. The stock evolution follows a diffusion process which evolves according to the stochastic differential equation dQ ¼ ½RðQÞ  qdt þ σ ðQÞdZ

(4)

where Z is a standard Wiener process and σ 2(.) is the corresponding variance. Specifying σ(Q) = σQ, with σ a given constant, gives rise to a geometric Brownian motion and greatly facilitates the analysis. Taking again the expected cumulative net benefit as the objective for optimization, one can employ stochastic dynamic programming to derive the optimal extraction rule q(Q) and the associated steady state distribution. The prudence implications for this type of uncertainty are again ambiguous and depend on the properties of the recharge and benefit functions (see Pindyck 1984 for examples in which the optimal exploitation rule q(Q) increases, remains unchanged, or decreases as the variance parameter σ is increased). Other examples of resource management under stochastic stock dynamics include Plourde and Yeung (1989), Knapp and Olson (1995), and Wirl (2006). The former considers pollution control when the accumulation process is stochastic due to the random absorption capacity of the ecosystem and finds that a user charge on inputs is preferable to the common “pollution standards” approach. This result is similar to that obtained by Weitzman (2002) in the discrete time setting. The second paper studies groundwater management with stochastic recharge due to uncertain precipitation, while the third studies climate policies under a stochastic global temperature process.

3.3

Discounting

Effects of discount rate variability are most pronounced when consequences of resource exploitation extend far into the distant future, such as in climate change or in nuclear waste disposal problems. In such cases, even slight changes in the discount rate entail exceedingly large differences in the weight assigned to the wellbeing of generations in the distant future and on optimal policies. The discount rate changes with time preferences and technological shocks. Uncertain discounting due to future technological shocks has been analyzed in a number of works (see Gollier and Weitzman 2010 and references therein). Based on the discount rate distribution, an expression for the effective discount rate is derived and shown to decline gradually over time, approaching the lower end of the distribution in the long run. This feature can have large effects on optimal policies since it weighs the far future much more heavily than under the standard constant-rate discounting.

1446

Y. Tsur and A. Zemel

In light of the large variability observed in intragenerational time preferences, it is expected that the same holds for time preferences across generations. Thus, the time preferences of future generations are highly uncertain. These preferences depend on economic performance, technological progress, and availability of resources in the far future, and the treatment of the associated uncertainty requires integrating the canonical resource model of Sect. 2 within an economy-wide model. These issues are considered in Sect. 4

3.4

Instantaneous Benefit

The flow of instantaneous benefit is also likely to be influenced by uncertain shocks, some of which are in the form of a stochastic diffusion process, while the others are substantial and abrupt. An example of the latter is a sudden drop of the demand for the resource as a result of a technological breakthrough (e.g., the effect of the development of fiber-optics communication on the demand for copper transmission lines). Such discrete shocks can be discussed in the context of uncertain time horizon T. A benefit diffusion process can be driven by a stochastic stock evolution (via the dependence of u(.) on the stock Q) as discussed in Sect. 3.2 or by benefit-specific fluctuations. An example of the latter is the stochastic demand for a nonrenewable resource introduced by Pindyck (1980). Tsur and Graham-Tomasi (1991) studied renewable groundwater management when the demand for the resource fluctuates with rainfall. They distinguished between two information scenarios, depending on whether groundwater extraction decisions are made before or after the rainfall realization is observed. They also considered the reference case in which rainfall is stable at the mean. By comparing these three scenarios, they have been able to define the value of groundwater (the “buffer value”) due to its role in mitigating the fluctuations in water supply. Conrad (1992) considered the control of stock pollutants when the pollution damage follows geometric Brownian motion, while Xepapadeas (1998) incorporated stochastic benefit shocks within a climate change model. The pollution stock process (atmospheric greenhouse gas concentration) is assumed to follow deterministic dynamics, but the damage it inflicts is modeled again as a diffusion process. The model considers a group of countries with deterministic private emissions and a stochastic public damage which depends on the global stock of pollution. The problem of coordinating emission abatement is analyzed via the optimal stopping methodology under cooperative and noncooperative modes of behavior on part of the participant countries.

3.5

Post-Planning Value

The post-planning value determines the loss associated with occurrence hence the degree of effort that is optimally invested in avoiding the event or reducing its occurrence hazard. Uncertainty regarding this value is similar to that associated with the preplanning regime, such as uncertain post-planning stock dynamics or

Dynamic and Stochastic Analysis of Environmental and Natural Resources

1447

instantaneous benefit. For example, Goeschl and Perino (2009) study R&D efforts to develop a backstop substitute for a polluting resource. The exact nature of the substitute is subject to uncertainty, as it is not known in advance whether the backstop technology will also turn out eventually to be harmful to the environment (a “boomerang”) in which case yet another technology will need to be developed later on or it will solve the pollution problem for good. They show how the probability of either outcome affects the timing of adoption of the new technology. Problems of long time horizons, such as global climate change, exacerbate the uncertainty regarding the post-planning value. Even if we knew precisely the temperature change a century ahead, it would be extremely hard to estimate the damage such a change would inflict on a future society which will surely differ greatly in its economic, technological, and demographic characteristics from what can be observed or predicted at the present time. Integrated assessment models, discussed in Sect. 4 below, deal with this kind of uncertainty in an ad hoc fashion.

3.6

Compound Uncertainties

The various uncertainty types presented above drive different responses in terms of the changes induced relative to the canonical certainty policy, with the sign of the change depending on the particular type under consideration. It is often of interest to study how the magnitude of these changes depends on uncertainty, when the latter is measured, for instance, by the variance of a related key parameter (e.g., the parameter σ 2 of Eq. (4)). Typically, each source of uncertainty drives the policy along a well-defined trend, and the effect responds monotonically to changing uncertainty. However, many resource management problems are subject to the combination of more than one type of uncertainty. When two (or more) types of uncertainty are combined, the policy response becomes more involved than in the case of a single type because the interaction between the types can give rise to new phenomena. Aiming to account for such situations, Clark and Kirkwood (1986) combined Reed’s (1979) discrete stochastic fish stock dynamics with measurement errors on the stock size at the beginning of each harvesting period, while Sethi et al. (2005) added a third component, namely, the inaccurate implementation of the harvest policy in each period. They showed that Reed’s (1979) constant escapement rule is no longer optimal when harvest decisions are made before realizations of the random shocks are observed, in which case the optimal policy may not admit analytic solution and the planner must resort to numerical methods. The effect of the interactions among different types of uncertainty is evident in the work of Saphores (2003) who considered stochastic stock dynamics under the threat of extinction if the biomass hits a barrier and found a non-monotonic response to increasing the stochastic variance: The increase in variance implies more precaution when the variance is small but calls for more aggressive harvesting when the variance is large enough. More recently, Brozović and Schlenker (2011) obtained a similar outcome when the stochastic stock dynamics is combined with the risk of an abrupt shift in ecosystem dynamics. These models allow the planner to take actions at discrete points of time, and the non-monotonic behavior is attributed to

1448

Y. Tsur and A. Zemel

changes implied by increasing the variance on the trade-off between reducing the shift probability vs. the cost of precautionary behavior. Leizarowitz and Tsur (2012) studied optimal management of a stochastically replenished (or growing) resource under threat of a catastrophic event such as eutrophication (of shallow lakes), species extinction, or ecosystem collapse. They considered discrete time and discrete state and action spaces. The catastrophic threat renders the single-period discount factor policy dependent, and as a result the compound discount factor becomes history dependent. The authors investigated whether an optimal Markovian-deterministic stationary policy exists for this problem. They answered this question in the affirmative and verified that the optimal state process converges to a steady state distribution. They identified cases under which the steady state distribution implies that the event will eventually occur with probability one and contrasted them with cases under which the catastrophic event will never occur. Employing a continuous-time formulation, Yin and Newman (1996) combined a stochastic output price process (as in Conrad 1992) with the catastrophic forest fires of Reed (1984) and found that the risk of fire entails different responses depending on whether the fire is a single event that prevents further exploitation or investments and fires can reoccur. In a similar framework, Balikcioglu et al. (2011 and references therein) combined the stochastic pollution stock dynamics (analogous to Eq. (4)) with stochastic uncertainty regarding the damage inflicted by this stock (as in Xepapadeas 1998). The optimal response is analyzed again via stopping theory, and the complexity introduced by the dual source of uncertainty necessitates the development of a sophisticated numerical method of solution. Zemel (2012) provides an analytic, continuous-time confirmation of the nonmonotonic response by incorporating the uncertain regime shifts of de Zeeuw and Zemel (2012) into the stochastic stock model Eq. (4). It is verified that the simultaneous action of both types of uncertainty is indeed required to obtain this behavior. When one or the other sources of uncertainty are switched off, the other acts to promote conservation (as expected). However, when the two sources interact, increasing the stochastic variance enhances the hazard effect when the variance is small but works in the opposite direction when the variance is large. In a world of multiple sources of uncertainty, it is therefore likely that non-monotonic response is more common than the simple, single-uncertainty-type models would suggest. Obviously, combining several uncertainty sources greatly complicates the management problem, and one usually has to resort to numerical methods to derive the optimal policy. This is the approach adopted by the integrated assessment models discussed in Sect. 4 below.

4

Integrating Natural Resources and Aggregate Growth Models

Some uncertain elements affect resource exploitation indirectly via their influence on economy-wide variables. Examples include the intra- and intergenerational variability of time preferences and technological shocks. Accounting for these uncertain

Dynamic and Stochastic Analysis of Environmental and Natural Resources

1449

elements requires incorporating the canonical resource model of Sect. 2 within an economy-wide (growth) framework. The approach taken in this section is in line with the views of ecological economists who have pointed out that problems of economic growth cannot be decoupled from the constraints imposed by the embedding environmental system. We briefly outline an integrated model of this kind and use it to discuss additional effects of uncertainty.

4.1

An Integrated Model

An important (though not the only) role of natural resources is to serve as sources of production inputs. Accordingly, suppose the extracted resource q is used as an input of production alongside capital K and human capital augmented labor AL (A is an index of human capital and L represents the labor force) to produce the output Y according to the technology Y = F(K, q, AL). The wealth of an economy is measured by its stocks of natural capital Q, producible capital K, and human capital A. The former changes according to Eq. (1) and K changes according to K_ ¼ F ðK, q, ALÞ  C  zðQ, qÞ  δK

(5)

where C is aggregate consumption, δ is a depreciation parameter, and z(.) is the extraction cost. (In the canonical model of Sect. 2, z(Q, q) is embedded in the instantaneous benefit u(Q, q, t), which is here replaced by the consumption utility.) The evolution of human capital may be driven by exogenous labor-augmenting technical change processes or by endogenous policies. Equation (5), then, can be viewed as a variant of a Solow-type growth model. Per capita consumption, c = C/L, generates the per capita instantaneous utility u(c), and welfare is measured by the present value of the utility stream ðT

LuðcÞeρt dt þ eρT vðQðT ÞÞ

(6)

0

where ρ is the utility discount rate which discounts future consumption solely due to the passage of time and should be distinguished from the interest rate r (the price of capital). The resource allocation problem requires to find the feasible consumptionexploitation-investment policy that maximizes the welfare Eq. (6) subject to the dynamic evolution of the capital stocks, given the endowment Q0, K0, and A0. More general variants of this model allow for multiple resources and for an explicit dependence of the utility also on some of the stocks (e.g., a clean environment or the preservation of species; see Heal and Kriström 2002 and references therein). In equilibrium the optimal policy follows (under some conditions) Ramsey’s formula r = ρ + ηg, where η is the elasticity of marginal utility and g is the rate of growth of per capita consumption. This condition varies with intergenerational variations in preferences (ρ and η) and in the growth rate (g). The “correct” rate to be used is controversial (see Stern 2008; Nordhaus 2008, and references therein),

1450

Y. Tsur and A. Zemel

and the controversy is exacerbated by the uncertain future evolution of these variables.

4.2

Uncertainty in the Integrated Model

The integrated model allows us to address a wider range of uncertainties as well as to study feedback effects between natural resources and the wider economy. For example, Tsur and Zemel (2009) looked at the effect of economic growth on climate policy regarding greenhouse gas (GHG) emission under threat of a catastrophic climate change whose occurrence probability depends on atmospheric GHG concentration. They found that economic growth motivates more vigorous mitigation of GHG emission such that in the long run anthropogenic GHG emission (beyond the natural rate) should be banned altogether. The reason is rather straightforward: As the economy grows richer, it stands to lose more in case the catastrophe strikes, while at the same time it can more easily afford to relinquish the resources needed to use and develop clean substitutes. What is less obvious is that, due to the global public bad nature of the threat induced by atmospheric GHG concentration, the market outcome gives rise to the opposite allocation, namely, maximal (in economic terms) use of polluting fossil fuels. Such an interaction between an economy-wide phenomenon, in the form of economic growth induced by technical change, and resource exploitation affecting the probability of triggering a damaging event can be addressed only within an integrated framework. As integrated models (particularly those aiming at describing faithfully the real world) tend to be analytically intractable, they call for the use of numerical analysis. Examples are the so-called integrated assessment models that link together climate and aggregate growth models (see Stern 2008; Nordhaus 2008, and references therein). Uncertainty in these models is often treated by considering a distribution for each of the unknown parameters and deriving the results for a large number of “scenarios”, each corresponding to a particular parameter specification. The results are then reported in terms of the most likely values as well as of some measure of their spread.

5

Irreversibility and Uncertainty

A ubiquitous feature of environmental management problems is the irreversibility characterizing many natural processes. This feature can come in the form of the abrupt catastrophic occurrences discussed above (examples of which are the reversal of the flow of the Gulf Stream due to global temperature rise, species extinction due to overharvesting or habitat destruction, the collapse of groundwater aquifers due to seawater intrusion, or the eutrophication of lakes as a result of the use of fertilizers along their shores). Otherwise, some of our actions (polluting emissions, forest clearing, or the extraction of exhaustible resources) cannot be undone (or can be corrected very slowly) when an unfavorable outcome is realized. These irreversible

Dynamic and Stochastic Analysis of Environmental and Natural Resources

1451

regime shifts are manifestations of non-convexities in the dynamic equations that drive the underlying natural processes. This feature implies fast transitions among competing stable equilibria and hysteresis phenomena. As stated above, the simplified description of these phenomena as irreversible transitions provides a useful approximation to derive the management policies. The presence of irreversibility really matters only under uncertainty, because otherwise undesirable outcomes can be anticipated in advance and avoided. Heal and Kriström (2002), Pindyck (2007), and the references they cite discuss in detail the effect of irreversibility on management policies under uncertainty. Presenting the problem in terms of the theory of real options, they identify two diametrical effects. If the damage associated with occurrence will turn out in the future to be very large, then exercising the option of aggressive extraction today entails a significant social loss. This effect pushes the cost-benefit balance towards more conservation. However, abatement activities often involve sunk costs (e.g., the purchase of abatement equipment that can be used only for that purpose) which give rise to the opposite effect. If it eventually turns out that the occurrence hazard or the associated damage has been overestimated, the abatement investment cannot be undone, and failing to exercise the option to wait and learn more about the hovering threat might turn out costly. The irreversibility-induced trade-offs are particularly pronounced in optimal stopping problems (e.g., Balikcioglu et al. 2011 and the references they cite) where the problem is to determine the optimal time to enact an irreversible change in policy (e.g., reduce emissions) at a sunk cost when the pollution and damage processes follow stochastic dynamics. This regime shift problem is reminiscent of the uncertain regime shift time T discussed in Sect. 3.1. Here, however, the time of shift is the decision variable rather than an exogenous parameter subject to uncertainty. Optimal stopping has also been used to study the optimal time to invest in R&D efforts aimed at developing a substitute for a nonrenewable resource (Hung and Quyen 1993) or for a polluting technology (Goeschl and Perino 2009). Wirl (2006) considered the consequences of two types of irreversibility on optimal CO2 emission policies when the temperature follows a diffusion process. First, emissions are irreversible in the sense that active collection of the polluting gases out of the atmosphere is not allowed. Moreover, stopping is irreversible so that once the decision to stop emissions is taken, it cannot be reversed. He found that these effects work against conservation and that irreversible stopping is never optimal.

6

Knightian Uncertainty

The literature cited so far treats uncertainty by converting random variables into expectations based on well-specified distribution functions. Often, however, the distribution functions themselves are only partially known – a situation referred to as Knightian (or structural) uncertainty. For example, as perceived at present, future growth rates may be random with unknown mean and/or standard deviation. When realizations of an informative random variable are progressively observed, the

1452

Y. Tsur and A. Zemel

underlying distribution can be deduced via Bayesian updating with progressive levels of accuracy. However, if the downside of possible outcomes (e.g., the consequences of a climate change induced catastrophe) is not bounded, the expected present value may be unbounded as well for any incomplete information (finite number of observations) underlying the Bayesian updated (posterior) probabilities. This situation was illustrated by Weitzman (2009) in a two-period model in which growth is random (due to a random climate parameter) with a distribution that is known only up to a scale parameter. The analysis points to the potential limitations of combining expected utility theory and Bayesian updating in analyzing decisions under uncertainty in general and for resource management in particular. Alternative approaches, involving the precautionary principle and ambiguity-averse learners, have recently been considered for resource management problems (see Vardas and Xepapadeas 2010 and references therein).

7

Conclusions

The proper response to uncertainty has become a prevailing consideration in the resource management literature, and the survey in this chapter attempts to expose the diversity of approaches developed for this purpose. A necessary step in dealing with uncertainty is the recognition that uncertainty is present in nearly every aspect of a resource management problem and that different types of uncertainty call for policy responses that may differ substantially and in some cases even diametrically. For example, some types of uncertainty encourage more conservation and cautious exploitation, while other types induce the opposite response – of a more vigorous exploitation (relative to the comparable situation managed under certainty). Although our aim was to cover the wide range of stochastic aspects relevant for environmental and natural resources management, it is recognized that a comprehensive treatment is not feasible within the limits of a single chapter and some important aspects had to be left out. For example, environmental resources are often shared by several agents, and their management is subject to strategic interactions among competing stake holders. These interactions are usually studied via the theory of dynamic games and involve again uncertainty of various types, including that due to asymmetric information among players (see Dockner et al. 2000 and the literature cited therein). The treatment of this important and complex topic is beyond the scope of this chapter.

References Aronsson T, Backlund K, Löfgren KG (1998) Nuclear power, externalities and non-standard pigouvian taxes: a dynamic analysis under uncertainty. Environ Resour Econ 11:177–195 Bahn O, Haurie A, Malhamé R (2008) A stochastic control model for optimal timing of climate policies. Automatica 44:1545–1558

Dynamic and Stochastic Analysis of Environmental and Natural Resources

1453

Balikcioglu M, Fackler PL, Pindyck RS (2011) Solving optimal timing problems in environmental economics. Resour Energy Econ 33:761–768 Brozović N, Schlenker W (2011) Optimal management of an ecosystem with an unknown threshold. Ecol Econ 70:627–640 Clark CW (1976) Mathematical bioeconomics: the optimal management of renewable resources. Wiley, New York Clark CW, Kirkwood GP (1986) On uncertain renewable resource stocks: optimal harvest policies and the value of stock surveys. J Environ Econ Manag 13:235–244 Clarke HR, Reed WJ (1994) Consumption/pollution tradeoffs in an environment vulnerable to pollution-related catastrophic collapse. J Econ Dyn Control 18:991–1010 Conrad JM (1992) Stopping rules and the control of stock pollutants. Nat Res Model 6:315–327 Cropper ML (1976) Regulating activities with catastrophic environmental effects. J Environ Econ Manag 3:1–15 Dasgupta P, Heal G (1974) The optimal depletion of exhaustible resources. Rev Econ Stud 41:3–28 Dasgupta P, Heal GM (1979) Economic theory and exhaustible resources. Cambridge University Press, Cambridge Dasgupta P, Stiglitz J (1981) Resource depletion under technological uncertainty. Econometrica 49:85–104 Dasgupta P, Heal G, Majumdar M (1977) Resource depletion and research and development. In: Intriligator MD (ed) Frontiers of quantitative economics, vol III B. North-Holland, Amsterdam Davison R (1978) Optimal depletion of an exhaustible resource with research and development towards an alternative technology. Rev Econ Stud 45:355–367 de Zeeuw A, Zemel A (2012) Regime shifts and uncertainty in pollution control. J Econ Dyn Control 36:939–950 Deshmukh SD, Pliska SR (1985) A martingale characterization of the price of a nonrenewable resource with decisions involving uncertainty. J Econ Theory 35:322–342 Dockner EJ, Jorgensen S, Long NV, Sorger G (2000) Differential games in economics and management science. Cambridge University Press, Cambridge Gjerde J, Grepperud S, Kverndokk S (1999) Optimal climate policy under the possibility of a catastrophe. Resour Energy Econ 21:289–317 Goeschl T, Perino G (2009) On backstops and boomerangs: environmental R&D under technological uncertainty. Energy Econ 31:800–809 Gollier C, Weitzman ML (2010) How should the distant future be discounted when discount rates are uncertain? Econ Lett 107:350–353 Heal G (1984) Interactions between economy and climate: a framework for policy design under uncertainty. Adv Appl Microecon 3:151–168 Heal G, Kriström B (2002) Uncertainty and climate change. Environ Resour Econ 22:3–39 Hotelling H (1931) The economics of exhaustible resources. J Polit Econ 39:137–175 Hung NM, Quyen NV (1993) On R&D timing under uncertainty: the case of exhaustible resource substitution. J Econ Dyn Control 17:971–991 Kamien MI, Schwartz NL (1978) Optimal exhaustible resource depletion with endogenous technical change. Rev Econ Stud 45:179–196 Karp L, Tsur Y (2011) Time perspective and climate change policy. J Environ Econ Manag 62:1–14 Kemp MC (1976) How to eat a cake of unknown size. In: Kemp MC (ed) Three topics in the theory of international trade. North-Holland, Amsterdam Knapp K, Olson L (1995) The economics of conjunctive groundwater management with stochastic surface supplies. J Environ Econ Manag 28:340–356 Leizarowitz A, Tsur Y (2012) Renewable resource management with stochastic recharge and environmental threats. J Econ Dyn Control 36:736–753 Long NV (1975) Resource extraction under the uncertainty about possible nationalization. J Econ Theory 10:42–53 Mangel M (1985) Decision and control in uncertain resource systems. Academic Press, Orlando

1454

Y. Tsur and A. Zemel

Nævdal E (2006) Dynamic optimization in the presence of threshold effects when the location of the threshold is uncertain – with an application to a possible disintegration of the western antarctic ice sheet. J Econ Dyn Control 30:1131–1158 Nordhaus WD (2008) A question of balance: weighing the options on global warming policies. Yale University Press, New Haven Pindyck RS (1980) Uncertainty and exhaustible resource markets. J Polit Econ 88:1203–1225 Pindyck RS (1984) Uncertainty in the theory of renewable resource markets. Rev Econ Stud 51:289–303 Pindyck RS (2007) Uncertainty in environmental economics. Rev Environ Econ Policy 1:45–65 Plourde C, Yeung D (1989) A model of industrial pollution in a stochastic environment. J Environ Econ Manag 16:97–105 Polasky S, de Zeeuw A, Wagener F (2011) Optimal management with potential regime shifts. J Environ Econ Manag 62:229–240 Reed WJ (1974) A stochastic model for the economic management of a renewable animal resource. Math Biosci 22:313–337 Reed WJ (1979) Optimal escapement levels in stochastic and deterministic harvesting models. J Environ Econ Manag 6:350–363 Reed WJ (1984) The effect of the risk of fire on the optimal rotation of a forest. J Environ Econ Manag 11:180–190 Reed WJ, Heras HE (1992) The conservation and exploitation of vulnerable resources. Bull Math Biol 54:185–207 Saphores JD (2003) Harvesting a renewable resource under uncertainty. J Econ Dyn Control 28:509–529 Sethi G, Costello C, Fisher A, Hanemann M, Karp L (2005) Fishery management under multiple uncertainty. J Environ Econ Manag 50:300–318 Stern N (2008) The economics of climate change. Am Econ Rev 98:1–37 Tsur Y, Graham-Tomasi T (1991) The buffer value of groundwater with stochastic surface water supplies. J Environ Econ Manag 21:201–224 Tsur Y, Zemel A (1995) Uncertainty and irreversibility in groundwater resource management. J Environ Econ Manag 29:149–161 Tsur Y, Zemel A (1996) Accounting for global warming risks: resource management under event uncertainty. J Econ Dyn Control 20:1289–1305 Tsur Y, Zemel A (1998) Pollution control in an uncertain environment. J Econ Dyn Control 22:967–975 Tsur Y, Zemel A (2009) Endogenous discounting and climate policy. Environ Resour Econ 44:507–520 Vardas G, Xepapadeas A (2010) Model uncertainty, ambiguity and the precautionary principle: implications for biodiversity management. Environ Resour Econ 45:379–404 Weitzman ML (2002) Landing fees vs harvest quotas with uncertain fish stocks. J Environ Econ Manag 43:325–338 Weitzman ML (2009) On modeling and interpreting the economics of catastrophic climate change. Rev Econ Stat 91:1–19 Wirl F (2006) Consequences of irreversibilities on optimal intertemporal CO2 emission policies under uncertainty. Resour Energy Econ 28:105–123 Xepapadeas A (1998) Policy adoption rules and global warming. Environ Resour Econ 11:635–646 Yin R, Newman DH (1996) The effect of catastrophic risk on forest investment decisions. J Environ Econ Manag 31:186–197 Zemel A (2012) Precaution under mixed uncertainty: implications for environmental management. Resour Energy Econ 34:188–197

Game Theoretic Modeling in Environmental and Resource Economics Hassan Benchekroun and Ngo Van Long

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Static Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 The Emissions Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Sustaining Cooperation in a Noncooperative World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Dynamic Games: Some Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Transboundary Stock Pollutants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 A Benchmark Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Centralized Versus Regional Control of Pollution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Provision of Clean Air and Interregional Mobility of Capital . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1456 1456 1457 1458 1463 1467 1467 1471 1472 1473 1474

Abstract

We cover applications of game theory in environmental and resource economics with a particular emphasis on noncooperative transboundary pollution and resource games. Both flow and stock pollutants are considered. Equilibrium concepts in static and dynamic games are reviewed. We present an application of game theoretical tools related to the formation and sustainability of cooperation in transboundary pollution games. We discuss the analytical tools relevant for the case of a stock pollutant and offer an application related to the optimal institutional arrangement to regulate a pollutant when several jurisdictions are involved. Keywords

Nash equilibrium · Dynamic game · Grand coalition · Equilibrium payoff · International environmental agreement H. Benchekroun (*) · N. Van Long Department of Economics and CIREQ, McGill University, Montréal, QC, Canada e-mail: [email protected]; [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_52

1455

1456

1

H. Benchekroun and N. Van Long

Introduction

Static and dynamic games have offered important tools to study many strategic interactions in natural resource and environmental economics as well as regional science and management science. The main difference between static games and dynamic games is that the latter deal with situations where economic agents operate in an environment that changes over time and agents can influence the evolution of the environment. In analyzing any problem of strategic interactions, it is usually better to begin with the simplest model. This often means that one should, as a first step, abstract from dynamic considerations. Static game theory is sufficiently rich to shed lights on many scenarios of social and economic interactions. On the other hand, many problems in economics are temporal problems by nature, and eventually the temporal dimension must be taken into account. For this reason, dynamic game models are often encountered in scientific journals in fields such as resource and environmental economics and regional and urban economics. Some warnings are in order. The environmental economics literature with a special interest in strategic behavior between regions is large. Since this chapter seeks to be self-contained, and given the space limitation, the material presented should be seen as a sample of the application of game theoretic tools to important classes of regional environmental and resource economics problems in a multiregion context. In particular we shall omit applications of cooperative game theory and only present a selection of noncooperative game theoretic models. For recent surveys of applications of game theory in environmental economics, we refer the reader to Jorgensen et al. (2010) and Long (2011). In Sect. 2, we consider within a static model the issue of environmental agreements. In Sect. 3, we turn to dynamic games with simultaneous moves and briefly explain various equilibrium concepts in dynamic games. In Sects. 4 and 5, we provide illustrations of these concepts applied to problems of natural resources and environmental economics with a multi-region setting.

2

Static Games

Static game theory is better suited for transboundary pollution problems involving emission and where a few players interact strategically. We present an application of game theoretical tools related to the formation and sustainability of cooperation in pollution games. We review emission games and abatement games and the comprehensive analytical treatment by Rubio and Ulph (2006) of the canonical model of international environmental agreements initiated by Barrett (1994). We begin with a noncooperative game of emissions. Then we turn to the question of how cooperation can be achieved and an analysis of stable coalitions. Note that while the models are presented in the case interaction among countries, they apply also to the case of a single country made of several regions with autonomous regulatory powers over pollution and resource use, as is the case in many countries. The main feature of these problems is the absence of a supranational

Game Theoretic Modeling in Environmental and Resource Economics

1457

authority or the lack of constitutional power of a central authority such as a federal government.

2.1

The Emissions Game

Consider a world consisting of N countries i = 1, . . . , N. A strategy for country i is a nonpositive level of emissions qi  0. Country i derives a net benefit flow b 1 π i ðqi , Qi Þ  aqi  q2i  ðqi þ Qi Þ2 2 2

P where Qi  k6¼i qk and a and b are two positive parameters. The term aqi  b2 q2i measures the gross benefit from consumption, and the term 12 ðqi þ Qi Þ2 measures the environmental damages each country suffers from the total emissions Q. Note that since the marginal damage from emissions is normalized to one, a large value of b represents a large marginal benefit or a small marginal damage cost. Assuming countries choose their actions simultaneously, the unique Nash equilibrium strategy is qi ¼ q ¼

a bþN

and the equilibrium payoff is   1 N 2 þ 2N þ b 2 π ¼ a: 2 ð N þ bÞ 2 

Let qc denote the level of emissions that maximizes world welfare. Then, qc ¼

a a : < q ¼ 2 bþN bþN

Clearly, welfare under cooperation is higher than Nash equilibrium level, πc ¼

  1 a2 1 N 2 þ 2N þ b 2  > π ¼ a 2 N2 þ b 2 ðN þ bÞ2 πc  π ¼

1 N 2 ð N  1Þ 2 a2 :  2  2 N þ b ðN þ bÞ2

Note that the gains from cooperation is decreasing in b. The gains from cooperation are most substantial when b ! 0. The above game, analyzed in Rubio and Ulph (2006), is a game of pollution emissions, which can be compared with the abatement game of Barrett (1994).

1458

H. Benchekroun and N. Van Long

(There exists a correspondence between the emissions game and an abatement game (see Appendix 1 in Rubio and Ulph (2006)).)

2.2

Sustaining Cooperation in a Noncooperative World

We have shown that the noncooperative outcome is inefficient. Let us consider a possible improvement by some form of cooperation. Suppose a subgroup of the players considers coordination of their strategies to improve on their noncooperative equilibrium payoff. We define an international environmental agreement (IEA) as cooperation among M countries, where M  2. Assuming the nonexistence of a supranational authority, we require any agreement to improve on the Nash equilibrium outcome to be self-sustaining. We formulate an IEA game as a metagame where an emissions game (Or an abatement game) is preceded by an initial stage where countries decide whether to join a coalition or not. An IEA consisting of M  N members chooses a vector of M strategies, one for each coalition member, to maximize the sum of their payoffs. When M = N, the coalition is called the grand coalition. Several criteria of stability have been proposed in the theory of coalition formation. The predominant stability criterion in the IEA literature uses the concepts of internal and external stability. This criterion is based on the assumption that when a country considers the gain from defection, it supposes that all countries in the IEA would continue to cooperate and maximize their joint welfare. (An alternative stability criterion is that of farsighted stability (for more details and references, see Benchekroun and Long (2011) Sect. 4.2).) Let π s(M) and π ns(M ) denote the equilibrium payoffs of the representative signatory and non-signatory countries. We say that a given IEA with M members is internally stable if no signatory gains by leaving the IEA, i.e., π s(M)  π ns(M  1). Similarly, external stability means that a non-signatory does not gain by joining the IEA, i.e., π ns(M )  π s(M + 1). An IEA is stable if and only if it is both internally and externally stable. Once an IEA has been formed, in Stage 2 game, one may consider two scenarios: (a) IEA members and non-signatories choose their actions simultaneously or (b) IEA members are the first movers, announcing and committing to their emissions policies before non-signatories can act. Most papers in the literature prefer the second scenario, i.e., the IEA members play a leadership role in the emissions game. We report below the analysis of the second scenario, following Rubio and Ulph (2006).

2.2.1 Stage 2: The Emissions Game Revisited Using backward induction, let us first determine the reaction function non-signatories. Suppose the first M countries are signatories. A non-signatory country k > M seeks to

Game Theoretic Modeling in Environmental and Resource Economics

1459



 b 2 1 2 max aqk  qk  ðqk þ Qk Þ : qk 0 2 2 Its reaction function is  qk ¼ max

 a  Qk ,0 : bþ1

(1)

Knowing the reaction function of the non-signatories, the collection of signatories chooses their emissions to maximize the sum of their payoffs, max

q1 ,...,qM 0

 XM  b 2 1 2 aqi  qi  ðqi þ Qi Þ i¼1 2 2

subject to the reaction function of the non-signatories. Under the symmetry assumption among coalition members, the maximization problem becomes   b 1 max M aqs  q2s  ðMqs þ ðN  M Þqns Þ2 qs 0 2 2 subject to 

a  Mqs ,0 qns ¼ max bþN M



where qs and qns denote respectively the emissions of a signatory and a non-signatory country. Following Rubio and Ulph (2006), consider three possibilities, depending on interior or corner solutions. For this purpose, define g ðb, M Þ  b2  ðN  M ÞðM  2Þb þ ðN  M Þ2 and   hðb, M Þ  b2 þ N þ M 2  2M b  ðN  M ÞM : The three possible cases are as follows. (a) Interior solutions for all countries. This occurs if and only if g(b, M ) > 0 and h(b, M ) > 0. The interior solutions are given by

qs ¼ and

agðb, M Þ bω

1460

H. Benchekroun and N. Van Long

qns ¼

ahðb, M Þ bω

where  ω  ðb þ N  M Þ2 þ bM 2 : The equilibrium payoffs are given by   a2 bN 2 1 2 π ðM Þ ¼ 2b ω s

and ! 2 2 2 a ð b þ 1 ÞN ð b þ N  M Þ 1 π ns ðM Þ ¼ : 2b ω2 (b) Corner solution for signatories. This occurs when g(b, M )  0. Then qs = 0 and qns ¼

a bþN M

with equilibrium payoffs π s ðM Þ ¼ 

a2 ð N  M Þ 2 2ð b þ N  M Þ 2

and π ns ðM Þ ¼

a2 ð b  ð N  M Þ ð N  M  2Þ Þ 2 2ð b þ N  M Þ 2

:

(c) Corner solution for non-signatories. This occurs if h(b, M)  0. Then qns = 0 and qs ¼

a M

Game Theoretic Modeling in Environmental and Resource Economics

1461

with equilibrium payoffs π s ðM Þ ¼ 

a2 ðb þ M ðM  2ÞÞ 2M 2

and π ns ðM Þ ¼ 

a2 : 2

Since functions g and h cannot be both negative, the configuration qs = qns = 0 is not possible in a Stackelberg equilibrium. This is because the marginal benefit at zero pollution level is equal to a > 0, whereas the marginal damage of pollution at zero is nil. Finally, note that when M = 1 or M = N, the solution is interior. Which solution occurs depends on b, N, and M. The creation of a coalition or a change in the size of a coalition can result in a change from an interior to a corner solution or vice versa. Therefore, before tackling the issue of coalition formation, it is important to clarify how the signs of the functions g and h depend on the model parameters. Rubio and Ulph fix N and study the sign of g and h as a function of b and M. The analysis can be summed up in the figures below. Figure 1 depicts the level curve g(b, M) = 0. The interior of convex region depicted represents all (b, M) such that non-signatories choose zero emissions. Figure 2 depicts the level curve h(b, M) = 0. The interior of region depicted represents all (b, M ) such that signatories choose zero emissions.

Fig. 1 Contour plot of g(b, M ) when N = 10

1462

H. Benchekroun and N. Van Long

Fig. 2 Contour plot of h(b, M )

2.2.2 Stage 1: The IEA Game We now turn to the analysis of stable coalitions within the emissions game. Proposition: There exists b1(N ), b2(N ) such that: (a) If b < b1(N  1), the unique stable IEA of the Stackelberg model with nonnegative emissions is the grand coalition (Proposition 3 in Rubio and Ulph (2006)). (b) If b  [b2(N  2), b2(4) = N  4], there exists an upper bound given by the smallest integer no less than n3 that belongs to a self-enforcing IEA. This upper bound decreases with b (Proposition 4 in Rubio and Ulph (2006)). (c) If N > 5 and b > N  4, the maximum level of cooperation that can be achieved by a self-enforcing IEA is three (Proposition 5 in Rubio and Ulph (2006)). (d) If b is large enough, the equilibrium is interior and the largest size of a stable coalition is two. From the result above, one can conjecture that the size of the largest stable IEA is a decreasing function of b. Thus, it is possible to get the grand coalition as a stable IEA, and this occurs when b is small enough. The possibility of a stable grand coalition is due to the leadership advantage of the IEA. Emissions are strategic substitutes (i.e., best responses are downward sloping). This in itself gives an incentive for a leader to increase its quantities relative to the case where it moves simultaneously with the non-signatories. The instability in the scenario where countries move simultaneously is due to the reaction of the outsider who increases its emissions following the creation of the IEA of (N  1) members. When the IEA is

Game Theoretic Modeling in Environmental and Resource Economics

1463

a leader, it decreases its overall level of emissions by a smaller amount than if it were not a leader (in which case, it possibly increases its emissions), and therefore, the outsider’s increase in emissions is smaller than under the simultaneous-move game (where an outsider possibly decreases its emissions). This moderate reaction of the outsider is the reason why a grand coalition can be stable under a leadership model. The externality of pollution induces the IEA (the leader) to reduce its emissions relative to the Nash equilibrium. In fact it can be shown that under the Nash equilibrium, an IEA may well end up with higher its emissions, resulting in a decrease of non-signatory emissions (possibly to a zero level). This is more likely to happen when b is small which explains the sustainability of the grand coalitions for small values of b. It is important to note that in a Nash equilibrium, the payoff of non-signatories is always larger than that of signatories and in a Stackelberg equilibrium, this is no longer necessarily true. Interestingly, the emissions game, the range of parameters (b, M) under which an IEA is sustainable, corresponds to the range where the gains from cooperation are the largest.

3

Dynamic Games: Some Concepts

Natural resource and environmental problems usually involve interactions in a changing physical environment. Therefore, dynamic games are well suited for the analysis of many resources and environmental problems. (For a recent comprehensive survey of dynamic games in the economics of pollution, see Jorgensen et al. (2010). Long (2011) surveys dynamic games in natural resources.) Dynamic games are also called state-space games. In a state-space game, the environment is represented by a vector of state variables, which directly or indirectly affect the payoffs of agents. Agents influence the evolution of the state variables by using their control variables. A dynamic game can be formulated in discrete time or in continuous time (see Long (2010, 2011) for surveys of models of both types). A dynamic game normally displays the following properties. Players receive a flow of benefits every period (or at every point of time). The overall payoff for a player is the sum (or integral) of his discounted flow of benefits over the time horizon. The benefit flow that a player receives in a period may depend on the current actions taken and on the “state of the system” in that period, as represented by the state variables. The state of the system changes over time, depending on the actions of the players. A difference equation or a differential equation describes the rate of change of each state variable. The term “differential games” is broadly interpreted to include both dynamic games in continuous time and those in discrete time, where the evolution of each state variable is described by a difference equation. Below is a description of a differential game in continuous time (see Dockner et al. (2000) for a more precise formulation). Time is represented by t. The game starts at time zero and ends at time T. There are n state variables, denoted by xi where i = 1, 2, . . . , n. The vector of state variables is x = (x1, x2, . . . , xn)  X  Rn. The set S  X  [0, T] is called the state-date space. An element (x, t) is called a (state, date) pair. The number of player is an integer N. Player j has a vector of m control

1464

H. Benchekroun and N. Van Long

variables, denoted by uj. Assume that uj (t)  Uj  Rm. We call Uj player j’s control space. Define U  ΠjUj. The evolution of the system is described by a system of s differential equations, x_i ðt Þ ¼ F i ðxðt Þ, u1 ðt Þ, u2 ðt Þ, . . . , uN ðt Þ, t Þ,

i ¼ 1, 2, . . . , n

where xi (0) = xi0 is given. In vector notation, x_ ðt Þ ¼ Fðxðt Þ, u1 ðt Þ, u2 ðt Þ, . . . , uN ðt Þ, t Þ: Player j’s instantaneous flow of benefits at time t is bj ðt Þ ¼ Bj ðxðt Þ, u1 ðt Þ, u2 ðt Þ, . . . , uN ðt Þ, t Þ: The time argument t will be suppressed when there is no risk of confusion. The overall payoff of player j is ðT

erj t Bj ðx, u1 , u2 , . . . , uN , t Þdt þ erj T S j ðxT , T Þ

0

where Sj(xT, T) is called the “salvage function” and rj  0 is the discount rate of player j. A player can be a firm, or a government, or an individual, etc. Each player j maximizes its overall payoff. In order to do this, it must have some ideas about what other players are doing. A Nash equilibrium is a strategy profile such that each player’s strategy maximizes its own overall payoff given what is predicted for the other players. (We focus on the case of simultaneous-move games because of space limitations. Games where agents play sequentially are called Stackelberg games (see, e.g., Benchekroun and Long (2011) Sect. 4.2 for more details on Stackelberg dynamic games in natural resource and environmental economics).) Such prediction depends on what strategy space each player is restricted to. Consider two types of strategies: path strategies and Markovian decision-rule strategies (or feedback strategies). A path strategy (or open-loop strategy) pj is a function that determines player j’s actions at each time t as a function of t and of the parameters of the model, including the initial stocks, but this function does not include the current value of the state variables. It is as if each player makes a commitment right at beginning of the game never to deviate from its planned time path of actions. Let Pj be the set of open-loop strategies that are available to player j. Let P  Π jPj. Once all players have chosen their open-loop strategies, the evolution of the state variables is described by x_i ðt Þ ¼ F i ðxðt Þ, p1 ðt Þ, p2 ðt Þ, . . . , pN ðt Þ, t Þ,

i ¼ 1, 2, . . . , n,

or, in vector notation, x_ ðt Þ ¼ Fðxðt Þ, pðt Þ, t Þ,

xð 0Þ ¼ x 0

xi ð0Þ ¼ xi0

Game Theoretic Modeling in Environmental and Resource Economics

1465

where pðt Þ  ðp1 ðt Þ, p2 ðt Þ, . . . , pN ðt ÞÞ: Assume this equation has a unique solution x(t). The overall payoff for player j is then W j ð x0 , p Þ ¼

ðT 0

  erj T Bj ðx ðt Þ, pðt Þ, t Þdt þ erj T S j xT , T :

Define an open-loop Nash equilibrium (OLNE) as a strategy profile b p ¼ ðb p1, b p2, . . . , b p N Þ  P such that no player can make itself better off by choosing a different open-loop strategy, i.e.,  p Þ  W j x0 , pj , b p j for all j: W j ð x0 , b To find an open-loop Nash equilibrium, one uses the maximum principle to derive the necessary conditions of each player’s optimal control problem, taking as given the time path of the vector of control variables of other players. Then one finds a fixed point b p such that all the necessary conditions for all players are satisfied. Next one verifies that the sufficient conditions are satisfied at that fixed point. One of the main advantages of the concept of open-loop Nash equilibrium is that such an equilibrium is relatively easy to find. Open-loop Nash equilibria are also attractive because they are time consistent. To see this, suppose the game is played and everyone has followed its Nash equilibrium strategy. Suppose at some time t1 > 0, when the state vector takes on the value x(t1) as anticipated, player j asks itself whether it can make itself better off by switching to a different strategy. Clearly, the answer is no, because its original choice of strategy obeys Bellman’s principle of optimality. (See, e.g., Leonard and Long 1992, Chap. 5 for a brief introduction to the principle of optimality.) On the other hand, if by mistake some players have deviated from its planned course of action, so that the stock size x(t1) is different from what was anticipated at time zero, then at t1 players will in general find that they would be better off by switching to another strategy. Therefore, open-loop Nash equilibria are not robust to “trembling hand” deviations (Selten 1975). One may say that open-loop Nash equilibria are not “subgame perfect” (even though the concept of a subgame is problematic in continuous time). For this reason, let us turn to the concept of Markov-perfect Nash equilibrium which overcomes this problem. Define a Markovian decision-rule strategy (or simply Markovian strategy for short) as a function that determines at each (state, date) pair what action to take. Let ϕj be player j’s Markovian strategy, then uj ðt Þ ¼ ϕj ðxðt Þ, t Þ:

1466

H. Benchekroun and N. Van Long

Let Qj be the set of Markovian strategies that are available to player j. Let Q  Π jQj. Once all players have chosen their Markovian strategies, the evolution of the state variables is described by x_i ðt Þ ¼ F i ðxðt Þ, ϕ1 ðxðt Þ, t Þ, . . . , ϕN ðxðt Þ, t Þ, t Þ

i ¼ 1, 2, . . . , n,

xi ð0Þ ¼ xi0

or, in vector notation, x_ ðt Þ ¼ Fðxðt Þ, ϕðt Þ, t Þ

xð 0Þ ¼ x 0

where ϕðt Þ  ðϕ1 ðt Þ, ϕ2 ðt Þ, . . . , ϕN ðt ÞÞ: Assume this differential equation has a unique solution for any initial condition (xt, t1). Define the performance index for player i at the (state, date) pair (x, t) by J j ðx, t, ϕÞ ¼

ðT

 erj ðτtÞ Bj ðxðτÞ, ϕðxðτÞ, τÞ, τÞ dτ þ erj ðTtÞ S j ðxT, T Þ:

t

We define a Markov-perfect Nash equilibrium (also called a feedback Nash equilibb 2, . . . , ϕ b N Þ  Q such that, at any (state, date) b ¼ ðϕ b 1, ϕ rium) as a strategy profile ϕ pair (x, t)  X  [0, T], no player can make itself better off by choosing a different strategy, i.e., b  J j ðx, t, ϕj , ϕ b j Þ J j ðx, t, ϕÞ

for all j:

It is important to stress the requirement that this inequality be satisfied for all possible (state, date) pair (x, t)  X [ [0, T], not just for the initial pair at time zero (x0, 0). As Reinganum and Stokey (1985) point out, a decision-rule Nash equilibrium for a given (x0, 0) is not necessarily Markov perfect. To be Markov perfect, a Nash equilibrium in decision rules must satisfy the additional property that the continuation of the given decision rules constitutes a Nash equilibrium when viewed from any future (date, state) pair. Dockner et al. (2000, example 4.2) give an example of a Nash equilibrium in decision rules that fails to be Markov perfect. To find a Markov-perfect Nash equilibrium (MPNE), the usual method is to make use of the Hamilton-Jacobi-Bellman (HJB) equations that the value function of each player must satisfy. The HJB equation for player j is   @V ðx, t Þ   @V j ðx, t Þ j b b rV j ðx, t Þ  ¼ max Bj x, uj , ϕ j ðxÞ, t þ F x, uj , ϕ j ðxÞ, t uj @t @x with the terminal condition

Game Theoretic Modeling in Environmental and Resource Economics

1467

V j ðx, T Þ ¼ S j ðx, T Þ: If T is infinite, the above terminal condition is replaced by lim ert V j ðxðt Þ, t Þ ¼ 0:

t!1

It is worth noting that OLNE and MPNE can be thought of as based on two alternative assumptions about the ability of players to precommit. In an OLNE, players commit to a whole time path of actions. In an MPNE, players cannot precommit at all. Reinganum and Stokey (1985) argue that in some cases, players may be able to commit to actions in the near future (e.g., by forward contracts), but not to actions in the distant future. They develop a simple model where a game begins at time 0 and ends at a fixed time T, and there are k periods of equal lengths δ, where kδ = T. At the beginning of each period, agents can commit to a path of action during that period. The special case where k = 1 corresponds to the open-loop formulation, and OLNE is then the appropriate equilibrium concept. At the other extreme, where δ ! 0, the appropriate equilibrium concept is MPNE. The choice of equilibrium concepts is to some extent dependent on tractability. The relative ease of finding an OLNE is one of its attractive features. For some examples of OLNE in the economics of natural resources, see Gaudet and Long (1994) and Benchekroun et al. (2009).

4

Transboundary Stock Pollutants

4.1

A Benchmark Model

Following Long (1992) and Ploeg and Zeeuw (1992), let us consider a world consisting of two countries. Let Qi(t) be country i’s output at date t. Assume that emissions are proportional to output, Ei(t) = Qi(t). Let P(t) denote the stock of pollution. Assume P_ ðt Þ ¼ E1 ðt Þ þ E 2 ðt Þ  δPðt Þ

(2)

where δ > 0 is the decay rate. The pollution damage suffered by country i at time t is The net utility of country i is

cP2 2 .

1 c U i ðt Þ ¼ AE i ðt Þ  ðQi ðt ÞÞ2  ðPðt ÞÞ2 2 2 and its social welfare is Wi ¼

ð1 0

eρt U i ðt Þdt

1468

H. Benchekroun and N. Van Long

where ρ > 0 is the rate of discount. Let us find the open-loop Nash equilibrium of this model. Since countries use path strategies in the open-loop formulation, let us suppose that country i believes that country j’s emission strategy is Ej ðt Þ ¼ gOL j ðt Þ . Then it seeks to solve the following optimal control problem: ð1 max E i ð:Þ

e

ρt



0

1 c 2 2 AE i ðt Þ  ðEi ðt ÞÞ  ðPðt ÞÞ dt 2 2

(3)

subject to P_ ðt Þ ¼ E i ðt Þ þ gOL j ðt Þ  δP ðt Þ

P ð 0Þ ¼ P 0 :

(4)

Applying the maximum principle, we obtain the necessary conditions    E_ i  ρðE i  Ai Þ ¼ ci P  δðEi  Ai Þ P_ ¼ E i þ gOL j  δP

Pð0Þ ¼ P0

and the transversality condition is lim eρt ðE i ðt Þ  Ai ÞPðt Þ ¼ 0:

t!1

Since the two countries are identical, we obtain the following system of two differential equations E_ ¼ cP þ ðρ þ δÞðE  AÞ P_ ¼ 2E  δP

P ð 0Þ ¼ P 0

(5) (6)

with the transversality condition lim eρt ðE ðt Þ  AÞ ¼ 0:

t!1

(7)

  OL There is a unique steady-state pair POL 1 , E 1 where 2A ðδ þ ρÞ 2c þ δðδ þ ρÞ

(8)

Aδðδ þ ρÞ δPOL ¼ 1 : 2c þ δðδ þ ρÞ 2

(9)

POL 1 ¼ EOL 1 ¼

Comparing with the case where the two countries cooperate and maximize the sum of their welfare, we see that the steady-state stock of pollution POL 1 is too high.

Game Theoretic Modeling in Environmental and Resource Economics

1469

What happens if countries use feedback strategies? Suppose country i believes that country j employs a feedback emission strategy, Ej ðt Þ ¼ gFB j ðPðt ÞÞ, so that its rate of emissions at t is conditioned on the currently observed level P(t). Then country i maximizes ð1 max E i ð:Þ

e 0

ρt

1 c 2 2 AE i ðt Þ  ðE i ðt ÞÞ  ðPðt ÞÞ dt 2 2

(10)

subject to P_ ðt Þ ¼ E i ðt Þ þ g FB j ðS ðt ÞÞ  δP ðt Þ

P ð 0Þ ¼ P 0 :

(11)

Realizing that g FB j ðP Þ is a function of the pollution stock, country i knows that it can indirectly manipulate country j’s emissions at t by influencing the evolution of P. This strategic consideration was absent in the open-loop case. To find the feedback Nash equilibria of this game, we make use of the HamiltonJacobi-Bellman (HJB) equations. The HJB equation for country i is

  1 2 c 2 0 ρVi ðPÞ ¼ max AE i  Ei  P þ Vi ðPÞ E i þ Ej ðPÞ  δP Ei 2 2 where Ej(P) is country j’s feedback strategy and Vi (P) is country i’s value function. The transversality condition is lim eρt V i ðPðt ÞÞ ¼ 0:

t!1

(12)

The first-order condition with respect to Ei is Ei ¼ A þ Vi 0 ðPÞ. This equation gives Ei = Ei (P), i.e., country i’s emissions depend only on P. Appealing to symmetry, we get the HJB equation ρV ðPÞ ¼

i 1h 2 c 2 A þ 4AV 0 þ 3ðV 0 Þ  δPV 0  P2 : 2 2

(13)

This equation and the transversality condition Eq. (12) identify the set of possible Markov-perfect Nash equilibria. Let us conjecture that the value function is quadratic V ðP Þ ¼ 

ωP2  πP  μ: 2

(14)

Then V 0(P) =  ωP  π and hence the feedback strategy is linear EðPÞ ¼ A  π  ωP:

(15)

1470

H. Benchekroun and N. Van Long

It is plausible to expect that ω > 0, i.e., a higher stock will make countries choose lower emissions, and π > 0, i.e., if P = 0, the marginal effect on welfare of an exogenous increase in P is negative. Making use of Eq. (14) and Eq. (15), the HJB equation gives a quadratic equation of the form λ0 þ λ1 P þ λ2 P2 ¼ 0 where λ0, λ1, and λ2 are expressions involving the parameters δ, ρ, c and the coefficients ω, π, μ. Since this equation must hold for all P, it follows that λi = 0 for i = 0, 1, 2. Using these three conditions, we can solve for ω, π, and μ. We obtain " rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi#   1 ρ ρ 2  δþ þ 3c : ω¼ þ δþ 3 2 2

(16)

(To ensure convergence to a steady state, the positive root ω > 0 is selected.) Next, compute π and μ as follows: π¼ μ¼

2Aω δ þ ρ þ 3ω

ðA  π Þ ð3ω  δ  ρÞ  μm : 2ρ

The linear feedback strategy is E¼

A ð δ þ ρ þ ωÞ  ωP: δ þ ρ þ 2ω

It follows that 2Aðδ þ ρ þ ωÞ P_ ¼  ð2ω þ δÞP: δ þ ρ þ 2ω

(17)

For P to converge to a steady state, it is necessary that 2ω + δ > 0. This inequality is satisfied if and only if the positive root for ω is selected. The steady-state pollution stock under the MPNE with linear feedback strategies is PFB 1 ¼

2Aðδ þ ρ þ ωÞ : ðδ þ ρ þ 2ωÞð2ω þ δÞ

(18)

Clearly, the OLNE steady-state pollution stock POL 1 is lower than the MPNE steady state PFB . This result is dependent on the fact that we have focused on a 1 quadratic functional form for the value function Vi(P). Dockner and Long (1993) show that there are other value functions that satisfy the HJB equation. These value

Game Theoretic Modeling in Environmental and Resource Economics

1471

functions result in nonlinear emission strategies. In fact there is a continuum of nonlinear strategies, and some of them outperform the OLNE in the sense that both countries would be better off under such strategies. When there are multiple equilibria, it is not clear which one is likely to prevail.

4.2

Centralized Versus Regional Control of Pollution

List and Mason (2001) consider an asymmetric version of the transboundary pollution model of Dockner and Long (1993) in the case of two regions and where pollution management can be centralized, i.e., dictated by a federal authority such as the EPA for the case of the USA, or CEA in Canada, or decentralized, i.e., regulation is chosen by local states or provinces. In their model, there are two regions that have different parameter values for the regional damage function and production function: 1 c U 1 ðt Þ ¼ AE 1 ðt Þ  ðQ1 ðt ÞÞ2  ðPðt ÞÞ2 2 2 and 1 c U 2 ðt Þ ¼ αAE 2 ðt Þ  ðQ2 ðt ÞÞ2  β ðPðt ÞÞ2 2 2 where α, β characterize differences between the two regions in vulnerability to flow and stock pollution as well as differences in abatement costs. An alternative interpretation is that differences in instantaneous utilities are the result of population differences. They characterize the equilibrium obtained when a central authority, whose objective is to maximize the sum of the two regions discounted sum of welfare, sets the environmental policy, assuming it is constrained by the constitution to set uniform environmental policies in both regions. Given the asymmetry of the two regions, the central authority cannot achieve a first best. They show that if β = 0, i.e., one region is not affected by the stock of pollution and α = 1, then welfare values under central control exceed those under decentralized control. However when β = 0 and α is large enough then, for small values of the stock of pollution, the present value of combined payoffs for the two regions is larger under decentralized than centralized control. The larger the asymmetry between the two regions, the larger the cost of implementing a central authority’s plan under the constraint of uniform regulation. When the asymmetry is large enough, the distortion introduced by the constraint outweighs the gains from the elimination of free riding (under a central authority). This result can be extended to the case where β > 0 using a continuity argument. When β = 0, it is shown that decentralization always results in higher rates of emissions and a larger steady-state stock of pollution than under central management.

1472

H. Benchekroun and N. Van Long

List and Mason (1999) examine whether environmental regulations should be carried out locally or centrally; i.e., by a central authority or by local regulators. Localities are assumed to have superior information or more leniency to adopt new environmental regulations. They consider the case of several pollutants. Consider two regions (states or provinces) indexed by i = 1, 2. In each region i production generates two flow of emissions, one local and one transboundary, denoted by Fi and Ei. These flows of emissions accumulate and form stocks of pollutants: a local stock pollutant denoted by Zi and a transboundary pollution stock by P. The evolution of the stocks is given by Z_ i ¼ F i  lZ i and P_ ¼ E 1 þ E 2  kP where k, l > 0 are nature’s purification rates for the transboundary and the local pollutant, respectively. The instantaneous utility Ui of country i is given by 1 1 X S U i ¼ AE i þ BF i  E 2i  E 2i  Z 2i  P2i  ρZ i P 2 2 2 2 where X and S are positive damage parameters. The parameter ρ captures the interaction between the local and the transboundary pollutants. When the interaction reduces damages, we have ρ < 0, and when ρ > 0, the two pollutants have synergistic negative effects. It is assumed that local authorities know the value of ρ, whereas the central authority ignores the true value of ρ and uses a value of ρ = 0 when choosing the optimal emission policy under a centralized system. They examine when local regulation dominates a central system in the case of carbon dioxide (the transboundary pollutant) and sulfur (local pollutants) and use parameter values based on empirical evidence. They show that there exist ρ+ > 0 and ρ < 0 such that the benefits from local control can more than offset the benefits from central control if synergistic effects are such that ρ > ρ+ or ρ < ρ.

5

Provision of Clean Air and Interregional Mobility of Capital

Regional governments may impose capital income tax to finance the provision of public goods such as clean air. However, a tax imposed on the earning of a factor of production will encourage that factor to move to another jurisdiction where the tax rate is lower. (With the exception of completely immobile factors, such as land and mineral resources.) Stigler (1965) points out that if all factors are mobile, redistributive taxation in a multi-jurisdiction world is practically infeasible. Zodrow and Mieszkowski (1986) show that if regional governments compete in source-based

Game Theoretic Modeling in Environmental and Resource Economics

1473

capital income tax rates, there will be a race to the bottom, leading to the underprovision of local public goods. The theoretical literature has identified size differences as a factor for explaining why different jurisdictions are affected asymmetrically by tax competition (Bucovetsky 1991; Wilson 1991). For generalization to two tax instruments, see Bucovetsky and Wilson (1991). Wang (1999) assumes sequential moves: the bigger region is the Stackelberg leader. A number of two-period models have been developed to investigate the implications of simple dynamic games between the owners of partially mobile factors of production on the one hand and a local government that tries to redistribute income in favor of some group, on the other hand. Lee (1997) shows that if capital movements involve adjustment costs, there will be a wedge between the internal rate of return and the external one. Jenson and Thomas (1991) model a game between two governments that use debt policies to influence the intertemporal structure of taxation. Huizinga and Nielsen (1997) formulate a two-period model in which even though capital is perfectly mobile, foreign capitalists in effect earn rents from local immobile resources. Wildasin (2002, 2011) presents a continuous-time, infinite-horizon model in which infinitely lived agents react to changes in taxation by moving resources across jurisdictions. Adjustment costs are explicitly taken into account. The author focuses on the case of a once-over tax change and does not deal with optimal time-varying tax rates. Instead, the analysis emphasizes the costly adjustment process and draws on the adjustment cost literature in macroeconomics (Turnovsky 2000). The main point is that capital earns quasi rent which can be taxed away, but such quasi rents erode with time. The optimal capital income tax rate depends crucially on the degree of capital mobility. Wildasin does not model a dynamic game involving the competition among jurisdictions to attract mobile resources. A truly dynamic game model is that of Koethenbuerger and Lockwood (2010). The authors consider an infinite-horizon dynamic version of the model of Zodrow and Mieszkowski (1986). There are n regions, with one firm in each region. Each region is subject to a stochastic output shock. These shocks imply that households would like to diversify their portfolios, and this dampens the tax competition among regional governments. Under logarithmic utility, they show that the Nash equilibrium path of capital income tax rate is time invariant. This constant tax rate is increasing in the preference parameter for the public good, the rate of discount, and the volatility of the output shock. There exists a critical threshold n^ such that the equilibrium tax rate is increasing in n if n < n^ and decreasing in n if n > n^. As n tends to infinity, the equilibrium tax rate tends to zero, which is an inefficient outcome.

6

Conclusion

Both static and dynamic games have been successfully employed to shed light on many resource and environmental issues involving strategic interactions among a number of players. The insights generated by game theoretic models can potentially be used to help design mechanisms for improving economic efficiency. In particular,

1474

H. Benchekroun and N. Van Long

empirical models are useful tools for policy making. Conversely, issues in resource and environmental economics have provided opportunities for researchers to sharpen their tools and to develop new concepts and techniques for dealing with emerging issues. Because of space limitations, we have covered noncooperative games only. We have omitted empirical models of dynamic games in resource and environmental economics that use real world data to calibrate parameters of demand and cost functions. We have also omitted games with asymmetric information. We refer the reader to the recent surveys in Long (2010, 2011) and Jorgensen et al. (2010).

References Barrett S (1994) Self-enforcing international environmental agreements. Oxf Econ Pap 46:878–894 Benchekroun H, Long NV (2011) Static and dynamic games in environmental and resource economics. In: Batabayal A, Peter Nijkamp P (eds) Research tools in natural resource and environmental economics. World Scientific, Hackensack Benchekroun H, Halsema A, Withagen C (2009) On nonrenewable resource oligopolies: the asymmetric case. J Econ Dyn Control 33:1867–1879 Bucovetsky S (1991) Asymmetric tax competition. J Urban Econ 30(2):167–181 Bucovetsky S, Wilson J (1991) Tax competition with two tax instruments. Reg Sci Urban Econ 21(3):333–350 Dockner E, Long NV (1993) International pollution control: cooperative versus non-cooperative strategies. J Environ Econ Manag 25:13–29 Dockner EJ, Jorgensen S, Long NV, Sorger G (2000) Differential games in economics and management science. Cambridge University Press, Cambridge, UK Gaudet G, Long NV (1994) On the effects of the distribution of initial endowments in a non-renewable resource duopoly. J Econ Dyn Control 18:1189–1198 Huizinga H, Nielsen SB (1997) Capital income and profit taxation with foreign ownership of firms. J Int Econ 42:149–165 Jenson R, Thomas EF (1991) Debt in a model of tax competition. Reg Sci Urban Econ 21:371–392 Jorgensen S, Martin-Herran G, Zaccour G (2010) Dynamic games in the economics and management of pollution. Environ Model Assess 15:433. https://doi.org/10.1007/s10666-010-9221-7 Koethenbuerger M, Lockwood B (2010) Does tax competition promote growth? J Econ Dyn Control 34(2):191–206 Lee K (1997) Tax competition with imperfectly mobile capital. J Urban Econ 42:222–242 Leonard D, Long NV (1992) Optimal control theory and static optimization in economics. Cambridge University Press, New York/Cambridge, UK List JA, Mason CF (1999) Spatial aspects of pollution control when pollutants have synergistic effects: evidence from a differential game with asymmetric information. Ann Reg Sci 33(4):439–452 List JA, Mason CF (2001) Optimal institutional arrangements for transboundary pollutants in a second-best world: evidence from a differential game with asymmetric players. J Environ Econ Manag 42(3):277–296 Long NV (1992) Pollution control: a differential game approach. Ann Oper Res 37:283–296 Long NV (2010) A survey of dynamic games in economics. World Scientific, Singapore Long NV (2011) Dynamic games in the economics of natural resources: a survey. Dyn Games Appl 1(1):115–148 Reinganum JF, Stokey NL (1985) Oligopoly extraction of a common property natural resource: the importance of period of commitment in dynamic games. Int Econ Rev 26:161–173

Game Theoretic Modeling in Environmental and Resource Economics

1475

Rubio S, Ulph A (2006) Self-enforcing international environmental agreements revisited. Oxf Econ Pap 58(2):233–263 Selten R (1975) Reexamination of perfectness concepts for equilibrium points in extensive form games. Int J Game Theory 4(1):25–55 Stigler G (1965) The tenable range of functions of local government. In: Phelps ES (ed) Private wants and public needs. W. W. Norton, New York, pp 167–176 Turnovsky SJ (2000) Methods of macroeconomic dynamics. MIT Press, Cambridge, MA van der Ploeg F, de Zeeuw AJ (1992) International aspects of pollution control. Environ Res Econ 2:117–139 Wang Y-Q (1999) Taxes under fiscal competition: Stackelberg equilibrium and optimality. Am Econ Rev 89(4):947–981 Wildasin DE (2002) Fiscal competition in space and time? J Public Econ 87:2571–2588 Wildasin DE (2011) Fiscal competition for imperfectly-mobile labor and capital: a comparative dynamic analysis. J Pub Econ, Elsevier, 95(11):1312–1321 Wilson J (1991) Tax competition with interregional difference in factor endowments. Reg Sci Urban Econ 21(3):423–451 Zodrow GR, Mieszkowski P (1986) Pigou, Tiebout, property taxation, and the under-provision of local public goods. J Urban Econ 19:356–370

Methods of Environmental Valuation John Loomis, Christopher Huber, and Leslie Richardson

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Benefit Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Use Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Nonuse or Passive Use Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Overview of Methods and How They Relate to Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Hedonic Property Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Economic Theory Underlying the Hedonic Property Method . . . . . . . . . . . . . . . . . . . . . 4.2 Data Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Econometric Modeling Including Spatial Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Travel Cost Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Single-Site Trip Frequency Models of Recreation Demand . . . . . . . . . . . . . . . . . . . . . . . 5.2 Valuing Quality Changes Using Trip Frequency Models . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Multiple-Site Selection Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Data Requirements for Travel Cost Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Stated Preference Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Contingent Valuation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Choice Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 The Issue of Bias in Stated Preference Surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Combining Stated and Revealed Preference Methods and Data . . . . . . . . . . . . . . . . . . . . . . . . .

1478 1479 1480 1480 1481 1482 1482 1483 1484 1484 1485 1486 1488 1488 1489 1489 1491 1493 1494

J. Loomis (*) Department of Agricultural and Resource Economics, Colorado State University, Fort Collins, CO, USA e-mail: [email protected] C. Huber U.S. Geological Survey, Fort Collins Science Center, Fort Collins, CO, USA e-mail: [email protected] L. Richardson National Park Service, Social Science Program, Fort Collins, CO, USA e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_54

1477

1478

J. Loomis et al.

8 Benefit Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1495 1497 1499 1499

Abstract

Commensurate valuation of market and nonmarket public goods allows for a more valid benefit-cost analysis. Economic methods for valuing nonmarket public goods include actual behavior-based revealed preference methods, such as the hedonic property method for urban-suburban public goods and travel cost models for outdoor recreation. For valuing proposed public goods for which there is no current behavior, or valuing the existence or passive use values of public goods, economists can rely on stated preference methods. While there is skepticism among some economists for relying on what people say they will pay rather than what their actual behavior suggests they will pay, there is general acceptance of stated preference methods. These stated preference methods include the well-known contingent valuation method and choice experiments (sometimes called conjoint analysis). Lastly, in situations where there is neither time nor money to conduct an original revealed or stated preference study, economists can rely on benefit transfers from existing revealed preference and stated preference studies to provide rough estimates of the values of public goods such as water quality, air quality, wetlands, recreation, and endangered species. Keywords

Hedonic property · Choice experiment · Contingent valuation · Revealed preference · Stated preference · Travel cost models

1

Introduction

One of the long-standing deviations from economic efficiency of even a perfectly competitive market with no subsidies to producers or consumers is that of negative externalities and the provision of public goods. In the face of these market failures, government intervention has the potential to improve economic efficiency, for instance, by imposing pollution taxes or tradeable permits to internalize negative externalities into the prices of goods associated with pollution. Further, government has the potential to improve economic efficiency by supplying or financing the supply of optimal amounts of a public good. However, the emphasis here is on the potential to improve economic efficiency through government action. For this potential to be realized, the level of the pollution taxes, for instance, must be set equal to the marginal environmental cost at the socially optimum level of output. Thus, to achieve this optimum requires having an estimate of the marginal environmental cost of pollution, or

Methods of Environmental Valuation

1479

alternatively, the marginal benefits of improving environmental quality (e.g., air quality, water quality). The same is true of public goods: the government must determine the marginal benefits of these public goods to society and compare them to the cost of producing alternative levels of the public goods in order to determine an optimum level to supply. Benefit-cost analysis is a technique used by government to determine if the benefits of increased environmental quality or provision of public goods are worth the cost. One of the greatest challenges of benefit-cost analysis is estimating the nonmarket benefits of regulations used to internalize negative externalities (e.g., installation of pollution control devices) and government supply of public goods (e.g., preservation of remote wilderness areas). This chapter is devoted to a review of environmental valuation methods frequently used by a wide variety of economists (i.e., academic, government, consultants) to estimate the economic benefits of improving environmental quality and providing public goods. The conceptual foundation of all environmental valuation methods is reviewed first. This is followed by a discussion of behavior-based environmental valuation methods. These methods are usually referred to as revealed preference methods and include the hedonic property method and the travel cost method. This section is followed by a review of stated preferences methods including the contingent valuation method and choice experiments. The next to the last section discusses how revealed preference and stated preference methods can be combined to provide more robust environmental valuations. Finally, “short cut” valuation methods called benefit transfer are reviewed.

2

Benefit Measures

Value has many different meanings, and it is important for economists to be precise as to what they mean by economic value or benefits of environmental quality and public goods. The economic value or benefit received by a person for any good, whether marketed or nonmarketed, is the maximum amount they would pay for it. The term economists use for this is maximum willingness to pay (WTP) . WTP is shorthand for both willingness and ability to pay. When estimated as the area under a consumer’s demand curve and above any costs actually paid to consume the good, it is usually referred to as consumer surplus. While there are many theoretical refinements to this measure, for an applied economist, consumer surplus is generally considered a reasonable approximation to these more theoretically correct concepts of consumer wellbeing. It is worth noting that nothing thus far has been said about jobs created by the production of a public good as an economic efficiency benefit, or jobs lost with environmental regulation as a cost. Except in times of unusually high and persistent unemployment, gains in jobs in one industrial sector are usually made up in another. Likewise, jobs lost in one geographic area are usually made up in another. Hence, jobs are considered transfers of economic activity from one industrial

1480

J. Loomis et al.

sector or geographic area to another. In other words, changes in jobs are not net gains or net losses to the economy as a whole and are usually excluded from an economic efficiency analysis such as benefit-cost analysis.

2.1

Use Values

For most market and nonmarket goods and services, economic benefits are largely received by individuals who actually consume or directly use the good. The benefits of another hamburger or a new reservoir are primarily to the consumers who use them. In the reservoir example, use values would accrue to those who receive drinking water from the reservoir, receive flood protection, or engage in recreation activities such as water skiing at the reservoir. The vast majority of benefits from a project or policy typically fall into the use category, as this is a very broad category. Use values include, for example, the value of publicly provided recreation, scenic views at national parks, and subsistence uses such as hunting for food and firewood collection. Use values also relate to reduction in health damages from cleaning up hazardous waste sites and improving air and water quality. These use values are also measured by the users’ maximum willingness to pay, so that there is consistency between valuation of market goods and nonmarket goods (i.e., the dollars are commensurate).

2.2

Nonuse or Passive Use Values

There are, however, unique natural resources such as Yellowstone National Park and rare/endangered species such as California condors and tigers, from which people often receive benefits from just knowing they exist in the wild. This type of value is known as existence value (Freeman 2003). Receiving this benefit does not require any on-site use. Rather, there is an enjoyment from reflecting on the existence of wilderness in Alaska’s Arctic National Wildlife Refuge undisturbed by oil and gas drilling. Likewise, some people receive enjoyment and satisfaction from simply knowing that these unique natural environments or species will be protected for future generations. This bequest value also does not require the individual who derives the benefit to set foot in the area or personally view it. Existence and bequest values are sometimes called nonuse values (Freeman 2003) or passive use values (US. District Court of Appeals 1989; Arrow et al. 1993). These values have been the focus of studies on biodiversity (see Abdullah et al. 2011 for a review of these valuation studies), as well as assessments of natural and cultural resource damages. For instance, the Exxon Valdez oil spill of 1989, in addition to an important 1989 court opinion Ohio v. U.S. Department of the Interior and passage of the Oil Pollution Act in 1990, all led to a heightened interest in the estimation of passive use values (Carson et al. 2003). Passive use values were again a critical component in determining the monetary value of natural resource damages from the 2010 BP Deepwater Horizon oil spill (see Bishop et al. 2017). Given that everyone can simultaneously enjoy the knowledge that a given unique natural environment exists or exists in a healthy state, passive use values have the

Methods of Environmental Valuation

1481

characteristics of public goods. It is particularly difficult to value the subset of public goods that have a substantial nonuse value since there is little tie to observable consumer behavior. However, as discussed later in this chapter, economists have developed and implemented stated preference valuation methods that can measure benefits that are independent of any observable use of a particular resource. These passive use values are also measured by the maximum amount that people who benefit from these public goods would pay for them. This ensures consistency between passive use values, use values, and market values.

3

Overview of Methods and How They Relate to Values

There are two broad classes of valuation methods for nonmarket resources. Revealed preference methods refer to approaches that indirectly infer WTP based on market transactions for other related goods. For example, estimating a demand curve for recreation is based on the variation in visitors’ travel costs. From the demand curve, visitors’ WTP or consumer surplus can be calculated. The generic label for this type of revealed preference method is the Travel Cost Method because it relies on travel behavior and travel costs. Another revealed preference method is the Hedonic Property Method. This method disaggregates the price of a house purchased into the attributes of the house itself (e.g., bedrooms, bathrooms), the neighborhood (e.g., school quality), and the surrounding environment (e.g., distance to work, distance to an amenity or disamenity to be valued). Since houses with proximity to desirable environmental attributes are demanded by more households, this drives up their price. The price premium for a location close to an amenity, such as an open space, a park, or good air quality, can then be inferred. In contrast, Stated Preference methods, such as the Contingent Valuation Method or Choice Experiments, rely on what people say they would intend to pay if a certain scenario occurs. For example, how much more I would pay in trip cost for access to a recreation site with better water quality, or how much more I would pay in taxes to protect an endangered species in a remote area. As will be discussed in more detail below, stated preference methods have the advantage of being quite flexible, so they can measure both use and passive use values. Stated preference methods can also value a wide range of public goods, including health, air quality, water quality, recreation, and endangered species. However, this flexibility comes at a price of potential hypothetical bias, where respondents to the survey may state they will pay more for the public good than they would actually pay when they must hand over their own hard-earned money. Section 6.3 describes what the literature finds with regard to when hypothetical bias is more likely to occur and what can be done to reduce it. It is important to emphasize that all of these estimation methods are simply alternative tools for measuring WTP. They do it differently, but the measure of value is still the same. At the end of this chapter, we will also talk about how revealed preference and stated preference data can be combined to utilize the strengths of each method. But for now, we will discuss each method separately.

1482

4

J. Loomis et al.

Hedonic Property Method

This revealed preference technique has been applied to estimating house price differentials with respect to natural hazards (e.g., earthquakes, floods, fires), environmental quality (air pollution, water pollution), and recreation access (e.g., open space, beaches). To understand how this versatile technique works, we will first review the theory underlying it, the data requirements, the econometric estimation, and finally, how WTP is calculated from the regression results.

4.1

Economic Theory Underlying the Hedonic Property Method

Competition for houses with desirable amenities pushes up the prices of these houses. Likewise, to entice home buyers to purchase homes with less desirable locations or disamenities, sellers must lower their prices. These premiums and discounts are intuitive, but in order to develop valid estimates of WTP, there must be a close link between the theoretical foundation and the empirical estimation. Further, any empirical model is based on a set of assumptions which are often embedded in economic theory. Below, we summarize the theory (see Taylor 2017 for more comprehensive discussion of the theory and empirical methods). With the hedonic property method, standard assumptions that consumers maximize utility and sellers maximize profits are employed. The consumer’s utility function is Lancasterian in nature, being specified in terms of the attributes of the house structure itself and its location. A stylized representation of the utility function is:   U i X , As , An, Ae

(1)

where Ui is utility of person i, and X represents all other nonhousing goods and is sometimes referred to as a composite commodity. The A’s represent attributes of the housing structure itself (As – e.g., number of bedrooms and baths), the neighborhood (An – e.g., education levels), and the environment (Ae – e.g., air quality or water quality). This utility function is maximized subject to the consumer’s budget constraint (where the price of the composite commodity is normalized to one). The consumer optimum occurs where: @Ph=@Ai ¼ ð@U =@Ai Þ=ð@U =@X Þ

(2)

where Ph is the price of the house. The interaction of the producers’ minimum willingness to accept to supply attributes and consumers’ maximum WTP for attributes results in an equilibrium price schedule for attributes As, An, and Ae. In an equilibrium between the producer

Methods of Environmental Valuation

1483

and consumers, @Ph/@Ai is the marginal WTP for small changes in one of these attributes, noted as Ai in Eq. (2). From the theory comes an estimable hedonic price function, as illustrated in Eq. (3): Ph ¼ β0 þ β1 ðHS Þ þ β2 ðSQÞ þ β3 ðNIncÞ þ β4 ðDWork Þ þ β5 ðEQÞ

(3)

where Ph is the Price of the house, HS is housing structure size, SQ is School Quality (e.g., graduation rates), NInc is neighborhood income – often proxied based on census tract or zip code, DWork is distance to employment centers, and EQ is environmental quality, such as air quality (parts per million of key pollutants), distance to open space, or distance to a disamenity like a landfill. The implicit price or marginal WTP for a small change in any attribute of the house structure, neighborhood, or environmental quality is simply the regression slope coefficient if the hedonic price function is linear. If the house price function is nonlinear, as it typically is, then the contribution of each additional unit of attribute to the house price is also related to the absolute level of house price. In this case, the formula for marginal WTP is slightly more complicated. Taylor (2017) provides formulas for the implicit price function for a variety of nonlinear functional forms. Since the implicit price function is for a marginal change in attribute levels, it will overstate the benefits of a large increase in attributes but understate the loss associated with large decreases in attributes. To accurately estimate the benefits for large gains or losses in attributes, a second stage hedonic demand model for the specific attribute must be estimated. Since discussion of this is beyond the scope and space available here, the interested reader should see Taylor (2017) for more details.

4.2

Data Requirements

The data required for this method is of course quite detailed. The analyst needs house sale prices, characteristics of the home, characteristics of the neighborhood, and the environment. This requires obtaining at least three different data sources. House sale prices and house characteristics are often available from county tax assessors’ offices or from private, third party real estate services. Characteristics of the neighborhood such as income, ethnicity, and average age is often found in block level data available from a government’s population census office or sold by third party vendors. Data on environmental quality of the neighborhood are often obtained from some form of monitoring station or field data. Location of houses relative to the amenity or the disamenity is often calculated using Geographic Information System (GIS) software. This requires that housing data be “georeferenced” in some form, whether using street address or coordinates. Needless to say, assembling the data can be time consuming but no more so than the other methods we will review.

1484

4.3

J. Loomis et al.

Econometric Modeling Including Spatial Dimensions

Since the implicit prices are essentially the regression coefficients, an econometric model must be estimated using the data assembled above. Historically, nearly all hedonic price functions were estimated using ordinary least squares regression in one form or another. More recently, there have been concerns about spatial dependence of prices between houses located near one another (e.g., same neighborhoods), a situation that requires special econometric attention. One consideration is the potential influence of what economists call endogenous variables from joint determination. For example, if a researcher is interested in the effect of proximity to open space on property values, there might be a situation in the urban-rural fringe where the price of developable property is influenced by a neighboring, and also potentially developable, open space, or property. Another spatial econometric consideration arises from housing sale prices in a particular neighborhood being related to some unobservable (to the analyst) characteristic shared by houses in that neighborhood. Since the characteristic is not observed by the analyst, it is an omitted variable in the regression equation. In the last decade, spatial econometric methods have been developed to address these problems. In cases where endogenous variables exist, the standard econometric approach is to use instrumental variables that are correlated with the endogenous variable but not with the house sale price. For the situation with unobserved omitted variables, several approaches can be taken, including the use of a spatial weights matrix, spatial autoregressive term, and quasi-experimental or natural experiment techniques (see Taylor 2017 for a detailed description of each approach). At present, some studies show that using these advanced methods may result in more accurate estimates of the implicit prices, but in other cases there is little difference (Mueller and Loomis 2008). The interested reader should see Taylor (2017) and Irwin and Bockstael (2001) for more details on these econometric issues.

5

Travel Cost Models

The revealed preference travel cost model essentially involves estimating a demand function for recreation. As such, the underlying theory is that of consumer demand theory. A visitor is assumed to maximize their utility subject to a budget constraint. Much like consumer demand theory, there are several admissible utility functions which result in different demand specifications. Besides the visitor’s own price of visiting the recreation site of interest, these demand functions should ideally include other information such as the visitor’s income and the price of visiting substitute sites. The details of how this conceptual demand model is implemented are specific to the different forms of the travel cost model, each of which are reviewed below.

Methods of Environmental Valuation

5.1

1485

Single-Site Trip Frequency Models of Recreation Demand

Many public recreation sites have no entrance fee or a minimal administratively set fee, and as a result, nearly all the implicit price paid for access to the recreation site is the travel cost incurred by the visitor. Thus, travel costs act as a proxy for price in estimating the demand curve. The use of travel cost as a proxy for price hinges on a couple key assumptions: (a) all travel costs are incurred exclusively to visit the site of interest, and only that site, on a trip from home, and (b) there are no significant benefits derived from the travel en route to the recreation area (i.e., the sightseeing on the way to the site has little value). To meet assumption (a), visitors can be queried if they are visiting multiple sites on the same trip. If so, they are excluded from the estimation data in most simple trip frequency models but can be included in more complex trip frequency models (Loomis et al. 2000). Travel cost models employ cross sectional data that use spatial variation in visitors’ travel costs. There is variation in visitors’ travel costs because visitors live at varying distances from the site. With a trip frequency model, the dependent variable is the number of trips each visitor takes over the year or the season to a particular recreation area. Visitors who live farther away from the site have higher travel costs. Similar to any downward-sloping demand curve, the number of trips taken to the recreation site (quantity demanded) decreases as the travel cost (price) of reaching the site increases. The price variable includes transportation costs (e.g., gasoline), but there may be other variable costs of the trip that would be included in the travel cost variable. These might include lodging or camping fees. Another variable that is usually included as an independent variable in a travel cost model is the visitor’s travel time to the site. However, this variable is often so highly correlated with the transportation cost that it cannot be included by itself. In that case, the monetary opportunity cost of this time is used to combine the cost of travel time with the transportation cost. Ideally, trip costs to the nearest substitute sites would be included as well, although these variables are sometimes correlated with travel cost to the site under study, thus making it difficult to include them in the model. Since we are estimating a demand function, visitor income is usually an appropriate independent variable to include. Other explanatory variables that capture visitor characteristics believed to influence the number of trips taken (e.g., number of children, age) are also frequently included. Visitor demographics can act as proxies to control for differences in tastes and preferences. A single-site trip frequency model is useful if the analyst is interested in: (a) the value of current recreation at the site and (b) the loss in consumer surplus if the site were closed due to agency budget cuts or reallocation of the land to an alternative use (e.g., mining). An example single-site demand curve specification is given in Eq. (4) for visitor i: AnTripsi ¼ β0  β1 ðTC i Þ þ β2 ðTC_Substitutei Þ þ β3 ðInci Þ þ β4 ðZ i Þ

(4)

where AnTripsi is the annual number of trips of visitor i to the site, TCi is the round trip travel cost of visitor i (transportation cost and time cost), TC_Substitutei is the

1486

J. Loomis et al.

travel cost of visitor i to the nearest substitute site, Inci is income of visitor i, and Zi is a vector of characteristics specific to the individual, such as age. There are several econometric specifications of single-site trip frequency models. Historically, most trip frequency models were estimated with ordinary least squares regression. However, since the early 1990s, count data regression models have been used to more accurately model the number of trips taken, which is a nonnegative integer. Count data models include the Poisson and the Negative Binomial. The Negative Binomial count data model does not require that the mean of trips equal the variance of trips as the Poisson model does. Since count data models are exponential models, they are equivalent to the semi-log of the dependent variable functional form. As such, the consumer surplus per trip is simply the absolute value of the reciprocal of the Travel Cost coefficient. See Parsons (2017) or Haab and McConnell (2002) for more details on count data models.

5.2

Valuing Quality Changes Using Trip Frequency Models

In addition to valuing overall site access, there is often interest in answering a wider range of management and policy relevant questions associated with changes in the quality or attributes of a recreation site. For instance, there may be interest in understanding: • How WTP would change given changes in the size of the recreation area protected. • How recreation benefits would change with management enhancements such as additional hiking trails, improved water quality, or wildlife populations. These marginal benefits can be compared to the marginal costs of carrying out the management action to determine if the added benefits justify the added costs. • The value of a change in site quality from allowing an incompatible use to occur at or nearby the site, such as drawing the reservoir level down for irrigation, reducing river flows to produce hydropower, or allowing a nearby mine that adds pollution to a lake. A single-site trip frequency model can be used to value changes in site quality by gathering additional data on the number of trips that would be taken under different levels of site quality. Visitors could be asked about the number of trips they currently take to the recreation site, as well as the number of trips they would have taken if, for example, their hunting success had been twice as high as the current rate, or water quality at the site had been twice as good as the current level. One approach to analyzing these data is to estimate a single demand equation that pools data from the actual and intended behavior and includes site quality as an independent variable (similar to Eq. (5) below). With the single equation model with a site quality variable, the consumer surplus is first calculated under the demand curve with site quality set at the current level or future without the management action level. Then

Methods of Environmental Valuation

1487

the site quality variable is set at the new level or future with the management action, which essentially shifts the demand curve. The consumer surplus is then recalculated from this shifted demand curve. The difference in the two consumer surplus calculations is the marginal WTP for the change in quality. An alternative approach is to estimate two separate demand models – one based on the number of trips taken under the current site quality and another based on the number of trips taken under the hypothetical change in site quality (Parsons 2017). Consumer surplus is then estimated in the same way as any single-site trip frequency model. The difference in consumer surplus between the two models gives an estimate of the value of the quality change (e.g., moving from current water quality to water quality that is twice as good). Of course, these approaches require the collection of additional stated/contingent behavior data from visitors and the ability to accurately depict or describe the change in site quality in a way that is understandable to the visitor. In the absence of collecting contingent behavior data, observing how visitation changes with, for instance, the size of a water body or environmental quality requires actual variation in recreation site quality or characteristics. While these attributes are generally fixed at a single recreation site, these characteristics usually vary across different recreation sites. Therefore, if the analyst pools or combines visitation data from several recreation areas which have varying levels of these attributes, then visitor response to these attributes can be estimated in the demand coefficients. This allows the analyst to estimate how the demand curve shifts given changes in a site attribute. The area between the original demand curve and the demand curve with increased size or level of environmental quality provides an estimate of the incremental or additional WTP for the increased amount of the attribute. Equation (5) specifies what a stylized multiple-site trip frequency demand model would look like for individual i visiting site j:     AnTripsij ¼ β0  β1 TC ij þ β2 ðInci Þ þ β3 ðZ i Þ þ β4 SS j þ β5 ðSQj Þ

(5)

where AnTripsij is the annual number of trips of visitor i to site j, TCij is round trip travel cost of visitor i to site j (transportation cost and time cost), Inci is income of visitor i, Zi is a vector of characteristics specific to visitor i, such as age, SSj is the Site Size of site j (e.g., number of acres) and SQj is the Site Quality of site j (e.g., water clarity, fish catch). The coefficients on the Site Size and Site Quality variables indicate how trips change with a one-unit change in Site Size and Site Quality. That is, how much will the demand curve shift with a one-unit change in site quality? It is this shift in the demand curve that allows calculation of the marginal benefit of the quality change. This calculation is done by integrating the area between the current and changed (positively or negatively) demand curve and expanding that to the population of visitors at the site. While it is possible to value quality changes using trip frequency demand models, the multiple-site selection models described in the next section are often more appropriate for valuing site quality (Parsons 2013).

1488

5.3

J. Loomis et al.

Multiple-Site Selection Models

Since the 1990s multiple site selection models have become quite popular. These models view the potential visitor as selecting a site to visit from a large choice set of possible recreation sites. These sites differ in terms of travel cost to the site and site quality. The individual is assumed to select the site which maximizes their utility given their budget constraint. A repeated discrete choice model has the visitor repeatedly making this site selection decision for each choice occasion (e.g., a day or a weekend) over the season and then sums up these trips across all choice occasions in a season. The theoretical foundation of this model is known as a random utility maximization (RUM) model since not all the variables in the visitor’s utility function are believed to be observable to the analyst. Thus, some of these unobservable variables are treated as random by the analyst, hence the name random utility model. Nonetheless, the site selected by the visitor reflects the one site on any given choice occasion that the visitor views as giving them the highest net utility. By dividing this utility by the negative of the coefficient on travel cost (which is also interpreted as the marginal utility of income), a monetary measure of WTP is calculated. The versatility of this model is that being a multiple-site model it can readily value changes in site quality or closure of one or more sites. The strong suit of this model is its ability to capture the influence of substitute sites in the individual’s choice of a recreation site to visit. Thus, the loss of value with closing one site is just the incremental loss in utility from having to visit their second-best site. The econometric specification of multiple-site selection models is quite different than trip frequency demand models. Now the dependent variable of the site visited on a particular choice occasion takes on a value of one and a value of zero for the remainder of sites in the set of recreation sites available to the visitor on that choice occasion. The individual’s travel cost is still included as the “price” of a recreation visit; however, an additional vector of site attributes (e.g., size of the site, water quality) is also included. A discrete choice or qualitative response model such as a multinomial logit is often estimated. With this model, an increase in environmental quality at one site (call it site A) is reflected by some visitors switching away from other sites to visit site A. By linking a multinomial logit site choice model to the trip frequency model discussed above, the analyst can also estimate the benefits of a change in site quality on both site selection and trip frequency. Herriges and Kling (1999) as well as Haab and McConnel (2002) and Parsons (2017) provide an in-depth discussion of these models.

5.4

Data Requirements for Travel Cost Models

For travel cost models, obtaining individual level data on travel costs and the number of trips taken usually requires a survey of visitors. If a single site model is being estimated the task is quite simple: the researcher focuses only on one site, either to conduct the survey on-site or to obtain visitors’ names and contact information to send them the survey. However, with multiple-site trip frequency or site selection models, visitation data for many sites are needed. This increases the data collection

Methods of Environmental Valuation

1489

costs, especially if on-site surveys are used. Alternatively, for some activities where licenses are required, such as hunting and fishing, it may be possible to conduct a mail survey and ask the user about all the sites they visit in one survey. This is of course burdensome on the respondent and may reduce the overall survey response rate. However, the pay-off from such a detailed survey is the ability to value changes in site quality and account for the availability of substitute sites when calculating the demand function and consumer surplus.

6

Stated Preference Models

When the change in environmental quality is outside of the previously observed range, or the desired value is one of nonuse, then the analyst cannot rely on revealed preference methods because no actual behavior is available to infer people’s preferences. Instead, economists can construct or simulate a market or a voter referendum to ask people how much they would pay if quality was improved or a unique natural environment protected. The first stated preference method discussed below is called the contingent valuation method, and the other is a newer stated preference method called the conjoint or choice experiment method. The two stated preference methods (contingent valuation and choice experiments) share many similarities in that: (a) a resource scenario is described to respondents in words, often supplemented by graphs, diagrams, drawings, or pictures to clearly communicate what the resource being valued is, as well as the quantity and quality of that resource. The scenario includes a baseline status quo with no additional cost and then one or more action alternatives with an associated cost to the respondent; (b) a means of payment by which the respondent pays the cost of provision of the increased quantity or quality of the natural resource or public good is included. The means of payment is tailored to the scenario; for instance, if the focus is on nonuse value, then some form of increased taxes (income, sales, property) or utility bill would be explained as the mechanism being used to finance the increment of the public good; and (c) the WTP question is typically a discrete choice with the respondent being asked if they personally would pay this amount (e.g., in a recreation setting) or vote to pay this amount (e.g., in a public goods setting). The magnitude of the monetary amount varies across the sample, allowing a quasi-inverse demand curve to be estimated. Given the discrete nature of the WTP question, a logit or probit regression model is often estimated to calculate the sample average maximum amount respondents would pay for the public good or improvement in quality.

6.1

Contingent Valuation Method

Typically, the contingent valuation method is used to estimate a single WTP value for a single scenario offering just one combination of quantity and quality of a public good. For example, in the Exxon Valdez oil spill contingent valuation study, a one-time WTP for the single scenario of avoiding another equivalently large and

1490

J. Loomis et al.

damaging oil spill was elicited using in-person interviews (Carson et al. 2003). A similar approach was taken in the contingent valuation study used to assess damages following the 2010 Deepwater Horizon oil spill (Bishop et al. 2017). However, some contingent valuation surveys provide multiple scenarios along a common quantity or quality scale. Then, a series of WTP questions are asked, allowing estimation of a WTP function for that increasing quality or quantity of a public good. For example, Walsh et al. (1984) asked respondents their annual WTP for four different amounts of land protected as wilderness. Multiple regression was then used to estimate WTP as a function of acreage protected along with demographics of the visitor. In terms of the format of the WTP question, Carson et al. (2003) and Bishop et al. (2017) used the closed-ended approach in their in-person interviews, where respondents were asked if they would pay a particular monetary amount which varied across the sample. Typically at least five, and more often ten, different levels of the monetary amount are asked so as to estimate the quasi-inverse demand curve. An example scenario and a binary closed-ended, or dichotomous choice, referendum WTP question format used by Loomis (1996) for a contingent valuation survey focused on dam removal is as follows: If an increase in your federal taxes for the next 10 years cost your household $YY each year to remove the two dams and restore both the river and fish populations (which was previously described in the survey) would you vote in favor? YES NO

In each survey, the $YY was randomly filled in with one of 15 different bid levels ranging from $3 to $190, with most of the bid levels being in between $15 and $45. To estimate the quasi-inverse demand curve, a binary logit model of the following stylized form might be estimated as in Eq. (6): logðProb Yes=ð1  Prob YesÞÞ ¼ β0  β1 ð$Bid Þ þ β2 ðX 2 Þ þ . . . þ βk ðX k Þ

(6)

where $Bid is the $YY level asked of the particular respondent, the Xs are the values of the non-bid independent variables that may represent tastes and preferences toward the resource of interest, and the βs are response coefficients for each variable. From Eq. 6, median WTP is calculated as: Median WTP ¼ ðβ0 þ β2 ðX 2m Þ þ . . . þ βk ðX km ÞÞ=jβ1 j

(7)

where X2m, . . . , Xkm are the means of the other non-bid variables. Collectively, the constant term β0 plus the sum of all the products is sometimes called the grand constant. In this example of a linear WTP model, mean WTP is calculated using the same formula as Eq. (7). Another common approach is to use an exponential WTP model, which replaces the bid amount with the natural log of the bid amount, thus restricting WTP to be nonnegative. In this case, mean and median WTP estimates are obtained using slightly different formulas (see Haab and McConnell 2002). There are also simple, less restrictive models that can be used to derive a lower bound estimate of WTP, such as the Turnbull estimator. This estimator does not assume any particular

Methods of Environmental Valuation

1491

distribution for WTP and instead relies only on the fact that a respondent who answers “yes” to an offered bid amount has a WTP that is no less than that amount (Haab and McConnell 2002). A similar nonparametric approach was used by Bishop et al. (2017) in their assessment of damages from the Deepwater Horizon oil spill. Many early contingent valuation method studies from the late 1970s through the 1980s used an open-ended WTP question format, where the individual writes into the survey the maximum amount they would pay. This can be analyzed using simple descriptive statistics or ordinary least squares regression. Another popular technique for mail surveys is the payment card, where individuals circle one of the preprinted monetary amounts representing the maximum amount they would pay. See Boyle (2017) for a more complete description of these alternative WTP question formats, and Haab and McConnell (2002) for details on the various econometric models associated with each question format. If use values are obtained from a contingent valuation study, then these values are expanded to the user population. For example, if the WTP of asthmatics to reduce air pollution is obtained, the sample would be generalized to exogenous estimates of the number of asthmatics in the population of interest. However, if nonuse values such as existence values for a pure public good like protection of the Grand Canyon or an endangered species are estimated, the relevant public could conceivably be the entire country. For the interested reader, an edited book on contingent valuation is Alberini and Kahn (2006). This book provides chapters that include updated guides for designing and implementing a contingent valuation survey, econometric methods, and applications of contingent valuation. Another more recent reference outlining the steps involved in conducting a contingent valuation study is Boyle (2017).

6.2

Choice Experiments

In some cases, policy makers do not have a well-defined single scenario but rather are interested in the values of individual natural resource management actions that might be combined into an overall management program or project. For example, when restoring wetlands, emphasis could be placed on providing habitat for an endangered species, but this might require prohibition of all hunting, wildlife viewing, and camping. Alternatively, the area could be managed for waterfowl hunting in one area, wildlife viewing in another, and camping in another part of the wetland. Each of these management options has different direct monetary costs and opportunity costs in terms of other foregone options. Policy makers and managers may want to know which of the many possible combinations of management actions would yield the greatest overall net benefits. Choice experiments are designed to answer these questions by estimating the marginal values or “part-worths” of each management option or attribute. Thinking about this from the viewpoint of the marketing literature, where this method originated, different combinations of management options yield different “product profiles.” Table 1 provides an example of a choice set focused on wetland restoration. In this example, Restoration Option A comprises 200 acres, with 100%

1492

J. Loomis et al.

Table 1 Example choice set #1 Allocation of restored wetland T&E Species Hunting Viewing Annual Cost per taxpayer Choose One

No action/current situation (Acres and % of land) 0 Acres and 0% 160 Acres and 80% 40 Acres and 20% $00.00

Restoration option A (Acres and % of land) 200 Acres and 100% 0 Acres and 0% 0 Acres and 0% $50.00

Restoration option B (Acres and % of land) 0 Acres and 0% 66 Acres and 33% 134 Acres and 67% $75.00

[

[

[

]

]

]

threatened and endangered (T&E) species habitat and zero acres available for hunting and wildlife viewing. This option has an associated cost to taxpayers of $50 per year. Restoration Option B might be 200 acres of wetland with one-third of those acres available for waterfowl hunting, two-thirds for wildlife viewing, and zero for T&E habitat. This option has an associated cost of $75 per year. The No Action (status quo) or Current Situation usually has a zero cost and serves as a baseline for which to compare the proposed alternatives to. In our example, the area may currently be a “de facto” wetland caused by excess agricultural drainage and used primarily as a “duck club” for private hunting. These product profiles are presented to survey respondents as choice sets in a table such as Table 1. Table 1’s “Choose One” is typical of most Choice Experiments and is consistent with the standard random utility formulation that underlies most choice experiments and recreation site selection models. However, this Choose One format does not obtain a great deal of valuation information from each choice (i.e., it is statistically inefficient). One solution typically used is to ask each respondent several of these choice sets. Another is to have the individual rank the options or choose the most preferred (best) and least preferred (worst). In a choice experiment, there are a large number of possible combinations of selected attributes and their levels. This yields a considerable number of possible choice sets, the exact number depending on how many levels of the attributes are included. If there are eight levels of costs included to get a precise estimate of the critical “price coefficient” and five levels of three additional attributes, for example, then there are dozens of possible combinations in what is called a full factorial design. A more compact design with fewer combinations is a fractional factorial design, such as an orthogonal design, usually focusing on just the main effects (or what will be the regression coefficient on that variable). In our example, a main effects design has 24 different product profiles (i.e., Restoration Options). The particular 24 combinations that minimize collinearity among the attribute levels can be determined using a statistical software package, such as SAS or Ngene (Holmes et al. 2017)1.

Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. government.

1

Methods of Environmental Valuation

1493

The next design decision is how many of these 24 combinations to give each respondent. Although there is no definitive rule, presenting four to six choice sets per respondent is common (Holmes et al. 2017). Once the survey versions are assembled and administered, analyzing the resulting data depends on the format of the choice question. Our example in Table 1 is typical with three options per choice set. A multinomial logit model is typically estimated when there are three or more options in a choice set. If there are just two options (i.e., the current status quo and one “action” option) then analysis is similar to a dichotomous choice contingent valuation question and uses a binary logit model. The multinomial econometric specification for a choice example as depicted in Table 1 would be:   exp β0 þ β1 Ai1 þ β2 Ai2 þ β3 Ai3 þ βp ð$Ai Þ    j ¼ 1,2,3 Probðij3Þ ¼ P exp β0 þ β1 Aj1 þ β2 Aj2 þ β3 Aj3 þ βp $Aj

(8)

where in our example with just three choices the sum is of the three alternatives. The individual is essentially comparing the value of the noncost attributes with the cost attribute to select the bundle that maximizes their relative utility in option 1 versus options 2 and 3, in our example. See Holmes et al. (2017) for a more in-depth treatment of the econometric models for these types of choice experiment data. Once the coefficients from Eq. (8) are estimated, the marginal values of each attribute are calculated by dividing the attribute coefficient by the absolute value of the coefficient on cost. With these marginal values, the overall economic value of different management options can be calculated. Comparing all the values of the different management options allows the analyst to determine the particular combination that yields the greatest value. As sometimes happens, the choice experiment survey may have to be conducted prior to managers exogenously arriving at a subset of likely options based on other criteria. However, once these options are known, the choice experiment results could be used to value these particular management options for inclusion within a benefit-cost analysis. This flexibility to value options that are not identical to what was asked in the survey is also an advantage in conducting many benefit transfers (see Rolfe et al. 2015 for a discussion of the advantages of choice experiments for benefit transfer).

6.3

The Issue of Bias in Stated Preference Surveys

A commonality of all stated preference methods is the concern about hypothetical bias, i.e., that the stated WTP is not equal to the survey respondents actual WTP. If hypothetical bias exists, stated WTP is not a valid indicator of “true” WTP. Economists have been concerned about this potential problem and have studied hypothetical bias for decades. Nonetheless, the issue entered the mainstream of economics during the early 1990s when contingent valuation was being applied to estimate the reduction in passive use values from the Exxon Valdez oil spill in Alaska. With hundreds of millions of dollars at stake, the strengths and weaknesses of the contingent valuation method were debated in the Journal of Economic Perspectives in 1994. This debate was revisited in 2012 in the

1494

J. Loomis et al.

Journal of Economic Perspectives to investigate what progress had been made during the intervening 20 years (see Kling et al. 2012 for a summary of the original debate and the evidence regarding improvement). While the literature on hypothetical bias is voluminous (see Loomis 2011 for a summary and Loomis 2014 for strategies for overcoming hypothetical bias), a few key results are worth noting. First, with use values, the bias is not always present. Studies that compare revealed preference techniques such as the hedonic property method to stated preference techniques such as the contingent valuation method show no statistical difference in WTP (Brookshire et al. 1982). Comparisons of benefit estimates from travel cost models and the contingent valuation method for recreation use values show, on average, no hypothetical bias (Carson et al. 1996). However, the less familiar the person is with the good being valued, the more likely hypothetical bias will be present. Thus, for the subset of public goods where the benefits are largely existence or passive use values, and for which people do not have firsthand knowledge or prior experience, valuation tends to show significant hypothetical bias (Champ et al. 1997). In response to this hypothetical bias, efforts have been made in survey design to reduce it via exhortations to respondents to behave as if it is a real market where they really must pay their own money (Cummings and Taylor 1999). Since the publication of Champ et al. (1997), ex-post calibrations of WTP values based on respondent uncertainty have also been frequently used to reduce hypothetical bias. A more recent strategy to minimize hypothetical bias is to make the survey consequential to the respondent by stressing that responses to the WTP question will have an effect on the ultimate project/policy outcome (Vossler and Evans 2009). Several other stated preference survey instrument design issues have been labeled as biases. One frequent concern is payment vehicle bias. This bias occurs if WTP is influenced by how a respondent pays, for example, via an income tax versus a utility bill. WTP response format bias occurs if WTP is influenced by whether the valuation question is asked in an open-ended format or closed ended format such as a dichotomous choice or payment card format. As noted by Carson and Groves (2007), the binary dichotomous choice question format is potentially incentive compatible while open-ended formats are not. See Bishop and Boyle (2017) for a discussion of these and other biases.

7

Combining Stated and Revealed Preference Methods and Data

Both stated preference and revealed preference methods have their strengths and weaknesses when estimating use values, such as those that might arise from reductions in urban air pollution. It was this insight that perhaps combining revealed preference and stated preference data in environmental valuation might be able to capitalize on their respective strengths while minimizing their weaknesses (Whitehead et al. 2011). In particular, this combination of approaches might help reduce the influence that any hypothetical bias might have in the WTP estimates. The marketing

Methods of Environmental Valuation

1495

literature had been using this approach for more than a decade for a number of purposes including testing for hypothetical bias (see Louviere et al. 2001). Since the early 1990s, there has been an explosion of combined revealed preference and stated preference studies, particularly in the context of outdoor recreation. The most recent compendium of state-of-the-art papers on combining revealed and stated valuation approaches is Whitehead et al. (2011). This book illustrates the wide variety of applications that the combined revealed preference and stated preference method has been used for. These include pesticide risk reduction, seafood, reservoir operations, as well as recreation. Of course, a reasonable question one might ask is “If you have revealed preference data why would you want to combine it with stated preference data?” There are several reasons, all related to limitations in relying solely on revealed preference data: (a) revealed preference data may not have sufficient natural variation in amenities or environmental quality to estimate a statistically significant coefficient. This could arise because of limited data availability (e.g., only one year of data rather than a time series being available), or because there is a lack of natural variation in the quality or amenity attribute; (b) the attributes are highly correlated in the data set so that it is nearly impossible to estimate a statistically significant coefficient on each of them separately (e.g., air quality and traffic congestion); (c) the policy being valued would result in changes in quality or level of the amenity that is outside the current range; and (d) introduction of a private good with a new attribute (e.g., locally grown organic corn) or new public good, similar but not identical to existing public goods, is being evaluated.

8

Benefit Transfer

Economists with state and federal agencies are often asked to perform benefit-cost analyses and assessments of damages to public resources under restrictive time and budget constraints that inhibit the collection of primary data. For instance, there may be a need for an estimate of the benefits and costs of a time sensitive policy proposal for which there is not a sufficient budget or time to conduct a survey. In this case, environmental economists have developed a set of protocols to transfer existing valuation estimates from previously conducted revealed preference and stated preference studies to evaluate the new policy in question. There are four main types of benefit transfer: (a) a point estimate value transfer from the most similar study, (b) an average of the values from the prior literatures’ most similar studies, (c) transferring the demand or WTP function from the prior study to the new policy study, and (d) using a meta-analysis regression equation estimated on past valuation studies to calculate what the valuation per unit (e.g., visitor day, household) would be at the new policy site. See Johnston et al. (2015) for a comprehensive discussion of benefit transfer methods. In principle, demand or WTP function transfer and transfers based on metaanalysis regression models have the advantage of being able to adapt the values from the existing literature to better match the criteria for an ideal benefit transfer

1496

J. Loomis et al.

than would a simple transfer of average values from the literature. Transferring a WTP function that contains demographic variables such as income and age would allow the demand function to be tailored to the socio-demographics surrounding the policy site. As discussed below, the WTP function approach usually reduces the benefit-transfer error as compared to transferring point estimates (Rosenberger and Loomis 2017). Meta-analysis involves a regression model with the value per unit (e.g., recreation day, acre of wetland, household) as the dependent variable and study site characteristics as the independent variables. There have been more than a dozen metaanalyses of environmental and natural resources, including water quality improvements (Johnston et al. 2016; Johnston and Thomassin 2010), game fish caught (Johnston et al. 2006), recreational hunting (Huber et al. 2018), river restoration improvements (Bergstrom and Loomis 2017), salmon preservation (Weber 2015), and conservation of threatened, endangered, and rare species (Richardson and Loomis 2009). See Nelson and Kennedy (2009) for additional examples. Using a meta-analysis regression equation as a benefit transfer tool has three potential advantages over average value transfer in terms of an ideal benefit transfer: (a) the ability to interpolate a value for a particular public good in a particular region that might not exist in the published literature (e.g., fish species X in region Y might not be available in the literature, only fish species Z in region Y or fish species X in region R); (b) the ability to incorporate a non-linear relationship between the value per unit and the quantity change (e.g., additional acres of wetlands may not have a constant value per acre as an average value transfer implicitly assumes); and (c) the ability to account for other attributes of the good being valued (e.g., distinguishing between the value of a recreation activity on public land vs. private land). Metaanalyses for benefit transfer are discussed in more detail in Bergstrom and Taylor (2006) and Johnston et al. (2015). Interest in determining the accuracy of benefit transfer, and especially comparing the accuracy of meta-analysis and average value transfers, has spawned a substantial literature. This literature uses a comparison of original study values versus benefit transfer estimates of those same values to calculate the error of benefit transfer. Rosenberger (2015) and Rosenberger and Loomis (2017) cataloged the various estimates of benefit transfer errors from value transfers (e.g., point estimates or average values) and function transfers (primarily meta-analyses). While most of the value transfer errors are in the range of 4% to 40%, several are off by 100% to 200% (and occasionally more), making the mean percentage transfer error 140%. Benefit function transfer generally performs better with a median error of 36% and a mean error of 65%, but it too can result in errors of 200% or more (Rosenberger 2015). One of the tools for improving the accuracy of benefit transfer is for analysts to have access to comprehensive databases and benefit functions. Significant progress has been made in this area in the last three decades. A major advance came with the Environmental Valuation Reference Inventory (EVRI), designed by Environment Canada in the early 1990s, in collaboration with international valuation experts and organizations from Australia, Canada, France, New Zealand, the United Kingdom, and the United States. EVRI includes an extensive database of economic valuation

Methods of Environmental Valuation

1497

studies focused on a variety of environmental assets (e.g., air quality, water quality, wildlife, recreation) and human health effects. More recently, the US Geological Survey (USGS), in partnership with the Bureau of Land Management and the National Park Service, developed the Benefit Transfer Toolkit. This toolkit includes databases of valuation studies, average value tables, meta-analysis regression functions, as well as an interactive map of studies focused on the value of outdoor recreation. In addition, Oregon State University provides the most comprehensive database of recreation use values, most recently updated in 2016 (Recreation Use Values Database 2016; Rosenberger et al. 2017). These databases are summarized in Table 2. Overall, each of the benefit transfer methods discussed here has their strengths and weaknesses, and the choice for which approach to use is at times driven by the availability of data, or lack thereof. For example, if there are no similar studies for a specific geographic region of policy interest, then a meta-analysis may be the best approach if a previously estimated meta-analysis regression model can be identified. If not, then an average of past valuation studies might be the best estimate the analyst can base the transfer on given the time available to conduct the benefit transfer. Previous research has found that simple mean value transfers may be better suited for transfers between relatively similar sites, whereas function transfers may be preferred when transferring between less similar sites (Bateman et al. 2011). However, any of these benefit transfer approaches are likely better than completely omitting a monetary value for a particular recreation activity, health effect, impact to wildlife or water quality, for example. Often, the net result of such an omission from a benefitcost analysis is an implied value of zero. Benefit transfer, while not as accurate as conducting a primary study, is typically more accurate than an estimate of zero.

9

Conclusion

The overall purpose of this chapter can perhaps be described in a few sentences. Economic theory provides a consistent measure to value market goods and nonmarket environmental externalities and public goods. In competitive markets, the market price is just willingness to pay for one more unit of the good. Where market prices do not exist, economists can infer willingness to pay using revealed preference methods or using a “constructed or simulated market” asking respondents to state their willingness to pay. These revealed preference and stated preference methods are based on the same utility maximization process economists use to estimate the demand for market goods. While the statistical details of estimating an econometric model for recreation are slightly different than estimating the demand for gasoline, often the basic structure of the data (e.g., cross sectional data) and econometric issues have more in common than one might think. What cannot be summarized in a few sentences is the wide variety of variations on these revealed and stated preference methods. These variations arise due to the need to tailor the valuation to the unique types of public goods being evaluated. As highlighted in this chapter, the economists’ toolkit has a wide variety of methods that

1498

J. Loomis et al.

Table 2 Nonmarket valuation databases Source Benefit Transfer Toolkit

Regions covered United States

Environmental Valuation Reference Inventory (EVRI)

International

New Zealand Non-Market Valuation Database

New Zealand

Recreation Use Values Database

United States, Canada

ValueBase

Sweden

Description Contains databases of recreation use values, as well as total economic values for threatened and endangered species, salmon preservation, and water quality improvements. Also includes average value tables, metaregression models, and an interactive map. Developed by the US Geological Survey, in collaboration with the National Park Service, Bureau of Land Management, Colorado State University, and Oregon State University Database of more than 4,000 summaries of valuation studies searchable by title, author, date of study, continent, environmental asset type, and valuation technique. Registration required. Developed by Environment Canada in collaboration with the U.S. EPA and other international experts and organizations Database of nonmarket valuation studies conducted in New Zealand. Includes study reference, mean value, methodology, and reviews of the study if available Database containing more than 3,000 recreation use value estimates (per person per day units) from more than 400 economic studies conducted between 1958 and 2015. Detailed information about each study is reported in the database. Developed by Dr. Randy Rosenberger at Oregon State University Database containing detailed information about economic valuation studies on environmental change in Sweden. Developed at the Beijer Institute of Ecological Economics within a project funded by the Swedish Environmental Protection Agency

Website https://my.usgs. gov/benefit-transfer

https://www.evri.ca

http://selfservice. lincoln.ac.nz/ nonmarketvaluation

http://recvaluation. forestry. oregonstate.edu

http://www.beijer. kva.se/valuebase. htm

Methods of Environmental Valuation

1499

can be applied to value nearly every type of public good that is commonly dealt with in benefit-cost, policy, or regulatory analyses. Environmental valuation theory and methods are evolving areas of research. While environmental valuation originated in the desire to value recreation in public water projects, it quickly saw application to value air and water quality in benefitcost analyses of environmental regulation. Environmental valuation rose onto the popular press’ radar screen with the application of valuation methods to natural resource damage assessments, including large oil spills. In the last decades, as the recognition has grown that the environment provides valuable ecosystem services to people, all the valuation methods discussed above, and stated preference methods in particular, have been employed to monetize these values. Interest in developing computer packages to allow government agencies to monetize ecosystem services relies extensively on benefit transfer. This highlights the importance of conducting stated and revealed preference studies that can be added to the existing stock of literature. Environmental valuation techniques continue to see new policy applications and no doubt there will be many more in the future.

10

Cross-References

▶ Cross-Section Spatial Regression Models ▶ Dynamic and Stochastic Analysis of Environmental and Natural Resources ▶ Housing Choice, Residential Mobility, and Hedonic Approaches ▶ Limited and Censored Dependent Variable Models ▶ Spatial Environmental and Natural Resource Economics ▶ The Hedonic Method for Valuing Environmental Policies and Quality ▶ Travel Behavior and Travel Demand

References Abdullah S, Markandya A, Nunes P (2011) Introduction to economic valuation methods. In: Batabyal A, Nijkamp P (eds) Research tools in natural resource and environmental economics. World Scientific, Hackensack, pp 143–188 Alberini A, Kahn J (2006) Handbook on contingent valuation. Edward Elgar, Northampton Arrow K, Solow R, Portney P, Leamer E, Radner R, Schuman H (1993) Report of the NOAA panel on contingent valuation. Fed Reg 58(10):4602–4614 Bateman IJ, Brouwer R, Ferrini S, Schaafsma M, Barton DN, Dubgaard A, Hasler B, Hime S, Liekens I, Navrud S, De Nocker L, Ščeponavičiūt_e R, Sem_enien_e D (2011) Making benefit transfers work—deriving and testing principles for value transfers for similar and dissimilar sites using a case study of the non-market benefits of water quality improvements across Europe. Environ Res Econ 50(3):365–387 Bergstrom JC, Loomis JB (2017) Economic valuation of river restoration: An analysis of the valuation literature and its uses in decision-making. Water Resour Econ 17:9–19 Bergstrom J, Taylor L (2006) Using meta-analysis for benefits transfer: theory and practice. Ecol Econ 60(2):351–360 Bishop RC, Boyle KJ (2017) Reliability and validity in nonmarket valuation. In: Champ P, Boyle K, Brown T (eds) A primer on nonmarket valuation, 2nd edn. Springer, Dordrecht, pp 463–497

1500

J. Loomis et al.

Bishop RC, Boyle KJ, Carson RT, Chapman D et al (2017) Putting a value on injuries to natural assets: the BP oil spill. Science 356(6335):253–254 Boyle K (2017) Contingent valuation in practice. In: Champ P, Boyle K, Brown T (eds) A primer on nonmarket valuation, 2nd edn. Springer, Dordrecht, pp 83–131 Brookshire D, d’Arge R, Schulze W, Thayer M (1982) Valuing public goods: a comparison of the survey and hedonic approaches. Am Econ Rev 72(1):165–177 Carson R, Groves T (2007) Incentive and information properties of preference questions. Environ Res Econ 37:181–210 Carson R, Flores N, Martin K, Wright J (1996) Contingent valuation and revealed preference methodologies: comparing the estimates for quasi-public goods. Land Econ 72(1):113–128 Carson R, Mitchell R, Hanemann M, Kopp R, Presser S, Ruud P (2003) Contingent valuation and lost passive value: damages from the Exxon Valdez oil spill. Environ Res Econ 25(2):257–286 Champ P, Brown T, McCollum D (1997) Using donation mechanisms to value nonuse benefits 735 from public goods. J Environ Econ Manage 33(1):151–162 Cummings R, Taylor L (1999) Unbiased value estimates for environmental goods: a cheap talk design for the contingent valuation method. Am Econ Rev 89:649–665 Freeman M (2003) The measurement of environmental and resource values: theory and methods, 2nd edn. Resources for the Future Press, Washington, DC Haab T, McConnell K (2002) Valuing environmental and natural resources: the econometrics of 743 non-market valuation. Edward Elgar, Northampton Herriges J, Kling C (eds) (1999) Valuing recreation and the environment: revealed preference methods in theory and practice. Edward Elgar, Northampton Holmes TP, Adamowicz WL, Carlsson F (2017) Choice experiments. In: Champ P, Boyle K, Brown T (eds) A primer on nonmarket valuation, 2nd edn. Springer, Dordrecht, pp 133–186 Huber C, Meldrum J, Richardson L (2018) Improving confidence by embracing uncertainty: a metaanalysis of US hunting values for benefit transfer. Ecosyst Serv 33:225–236 Irwin EG, Bockstael NE (2001) The problem of identifying land use spillovers: measuring the effects of open space on residential property values. Am Agric Econ 83:698–704 Johnston RJ, Thomassin PJ (2010) Willingness to pay for water quality improvements in the United States and Canada: considering possibilities for international meta-analysis and benefit transfer. Agric Res Econ Rev 39(1):114–131 Johnston RJ, Ranson MH, Besedin EY, Helm EC (2006) What determines willingness to pay per fish? A meta-analysis of recreational fishing values. Marine Res Econ 21(1):1–32 Johnston RJ, Rolfe J, Rosenberger RS, Brouwer R (eds) (2015) Benefit transfer of environmental and resource values, vol 14. Springer, New York Johnston RJ, Besedin EY, Stapler R (2016) Enhanced geospatial validity for meta-analysis and environmental benefit transfer: an application to water quality improvements. Environ Res Econ:1–33 Kling C, Phaneuf D, Zhao J (2012) From Exxon to BP: Has some number become better than no number. J Econ Perspect 26(4):3–26 Loomis J (1996) Measuring the economic benefits of removing dams and restoring the Elwha river: results of a contingent valuation survey. Water Resour Res 32(2):441–447 Loomis J (2011) What’s to know about hypothetical bias in stated preference valuation studies. J Econ Survey 25(2):363–370 Loomis JB (2014) 2013 WAEA keynote address: strategies for overcoming hypothetical bias in stated preference surveys. J Agric Resour Econ 39:34–46 Loomis J, Yorizane S, Larson D (2000) Testing significance of multi-destination and multi-purpose trip effects in a travel cost method demand model for whale watching trips. Agric Resource Econ Rev 29(2):183–191 Louviere J, Hensher D, Swait J (2001) Stated choice methods: analysis and applications in marketing, transportation and environmental valuation. Cambridge University Press, Cambridge

Methods of Environmental Valuation

1501

Mueller J, Loomis J (2008) Spatial dependence in Hedonic property models: do different corrections result in economically significant differences in estimated implicit prices. J Agric Res Econ 33(2):212–231 Nelson J, Kennedy P (2009) The use (and abuse) of meta-analysis in environmental and natural resource economics: an assessment. Environ Res Econ 42(3):345–377 Parsons GR (2013) Travel cost methods. In: Shogren JF (ed) Encyclopedia of energy, natural resource, and environmental economics, vol 3. Elsevier, Amsterdam, pp 349–358 Parsons G (2017) Travel cost models. In: Champ P, Boyle K, Brown T (eds) A primer on nonmarket valuation, 2nd edn. Springer, Dordrecht, pp 187–233 Recreation use values database (2016) Oregon State University, College of Forestry, Corvallis. Available at: http://recvaluation.forestry.oregonstate.edu/ Richardson L, Loomis J (2009) The total economic value of threatened, endangered and rare species: an updated meta-analysis. Ecol Econ 68:1535–1548 Rolfe J, Windle J, Bennett J (2015) Benefit transfer: insights from choice experiments. In: Johnston R, Rolfe J, Rosenberger R, Brouwer R (eds) Benefit transfer of environmental and resource values: a handbook for researchers and practitioners. Springer, Dordrecht, pp 191–236 Rosenberger R (2015) Benefit transfer validity, reliability and error. In: Johnston R, Rolfe J, Rosenberger R, Brouwer R (eds) Benefit transfer of environmental and resource values: a handbook for researchers and practitioners. Springer, Dordrecht, pp 307–326 Rosenberger RS, Loomis JB (2017) Benefit transfer. In: Champ PA, Boyle KJ, Brown TC (eds) A primer on nonmarket valuation, 2nd edn. Springer, Dordrecht, pp 431–462 Rosenberger RS, White EM, Kline JD, Cvitanovich C (2017) Recreation economic values for estimating outdoor recreation economic benefits from the National Forest System. Gen Tech Rep PNW-GTR-957. Portland, US Department of Agriculture, Forest Service, Pacific Northwest Research Station, 33 p Taylor L (2017) Hedonics. In: Champ P, Boyle K, Brown T (eds) A primer on nonmarket valuation, 2nd edn. Springer, Dordrecht, pp 235–292 U.S. District Court of Appeals (for the District of Columbia). State of Ohio vs. U.S. Department of Interior (1989) Case number 86-15755. July 14, 1989 Vossler C, Evans M (2009) Bridging the gap between the field and the lab: environmental goods, policy maker input and consequentiality. J Environ Econ Manage 58:338–345 Walsh R, Loomis J, Gillman R (1984) Valuing option, existence and bequest demands for wilderness. Land Econ 60(1):14–29 Weber MA (2015) Navigating benefit transfer for salmon improvements in the Western US. Front Mar Sci 2(74):1–17 Whitehead J, Haab T, Huang JC (2011) Preference data for environmental valuation: combining revealed and stated approaches. Routledge, New York

The Hedonic Method for Valuing Environmental Policies and Quality Philip E. Graves

Contents 1 2 3 4 5 6

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Value of Statistical Life . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hedonic Valuation of Environmental Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wage Compensation for Environmental Amenities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Property Value Compensation for Environmental Amenities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wage and Property Value Hedonics Are Not Alternatives: The Multimarket Hedonic Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 What if Single-Market Hedonic Analyses Are Employed Rather than Multimarket Analyses? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1504 1505 1506 1507 1508 1510 1513 1517 1518

Abstract

Benefit-cost analysts attempt to compare two states of the world, the status quo and a state in which a policy having benefits and costs is being contemplated. For environmental policies, this comparison is greatly complicated by the difficulty in inferring the values that individuals place on an increment to environmental quality. Unlike ordinary private goods, environmental goods are not directly exchanged in markets with observable prices. In this chapter, the hedonic approach to inferring the benefits of an environmental policy is examined. Keywords

Environmental Policy · Consumer Surplus · Land Market · Marginal Damage · Environmental Amenity

P. E. Graves (*) Department of Economics, University of Colorado, Boulder, CO, USA e-mail: [email protected]; [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_55

1503

1504

1

P. E. Graves

Introduction

The hedonic approach to valuing environmental benefits has its roots in agricultural economics (see, e.g., Waugh 1928; Vail 1932). Waugh related the price of asparagus, tomatoes, and hothouse cucumbers to various dimensions of perceived quality (e.g., for asparagus, color, size, and uniformity of spears). In another agricultural context, the value of agricultural land has been empirically related to soil fertility and distance from market and more recently to ecosystem services. In yet another agricultural context, this method – not yet known as the “hedonic method” – was employed to isolate “quality” changes from fertilizer price indexes as the former related to changing percentages of nitrogen N, phosphoric acid P, and potash K (see Griliches 1961 for discussion of the early history). The method first became known as the “hedonic method” as a result of Andrew Court’s (1939) work at General Motors. Court was interested in separating quality improvements from price increases as automobiles improved rapidly during the early decades after their first introduction. One can implicitly value the horsepower, size, and various other model features with this method, and that valuation could in turn be used to increase GM profit by providing more high-value but low-cost features. In a now-classic article, Solow used what was essentially the hedonic method in a time series context, holding constant measurable inputs to explain GDP growth – his now-famous “residual” (technological change) was seen to account for a quite large percentage of economic growth, the forerunner of modern endogenous growth models. It is Griliches (1961), however, who is generally viewed as the “modern father” of the hedonic method. He introduced many refinements in the method in the context of separating quality improvements from price increases to allow construction of better price indices to more accurately measure GDP growth. Early studies tended to focus on either the demand side or the supply side, with Rosen (1974) being the first to present a full general equilibrium discussion; the now-classic Roback (1982) contribution brought the realization that a full general equilibrium requires joint consideration of property value and wage differentials, which we shall return to later in this chapter. The earliest environmental application of the hedonic method was that of Ridker and Henning (1967). They established that housing prices in St. Louis were higher in cleaner areas, other things equal. There has been a proliferation of property value studies since that time. Valuing water quality is somewhat more difficult with the hedonic method for reasons beyond the scope of this chapter, and far fewer studies have been conducted for this environmental media. A relatively limited number of studies have also attempted to value noise from highways and airports as well as hazardous waste dumps. The valuation of each type of environmental amenity generally brings an amenityspecific set of problems, although the focus here will be primarily on air pollution.

The Hedonic Method for Valuing Environmental Policies and Quality

2

1505

Value of Statistical Life

This section discusses the first of two distinct areas in which the hedonic method is used in environmental policy, while the section to follow deals with the second. The “value of statistical life” (VSL) is useful to value mortality damages (in the “health effects” or “sum of specific damages” approach) in public policies of a wide variety, environmental policies being emphasized here. This method employs wage regressions to value the risk of on-the-job death, with more risky jobs requiring higher wages, at least in principle. In these studies, the dependent variable is wages (or ln wages) of individual workers which is regressed upon a vector of individual personal characteristics (e.g., age, education, race, sex, experience) and job characteristics (e.g., occupation, industry, unionization). The risk of death, although quite controversially measured, is then included, with an expectation of a positive coefficient to reflect the needed compensation for job risk. The compensation required for the higher risk can then be used to estimate the VSL for use in broader policy contexts. Suppose, for example, there is a 1/100,000th higher annual probability of dying on the job as a lumberjack than in an average job and that the typical lumberjack (of, say, 100,000 total) required $50 more per year (2.5 cents/h, with a 2,000 h-year) to accept this risk. The expected number of excess deaths is then one, and the aggregate willingness-to-pay (the VSL) would be $5 million, a number not far from those used in actual public policies. If a particular policy is expected to save 20 lives, with no other effects, it would have $100 million in benefits to be compared to provision costs. VSL has been inferred in non-labor market settings, as well, with the purchase of smoke detectors, seat belt use, various automotive safety features, etc. having been studied. These other approaches are, however, typically undertaken to corroborate the more ubiquitous labor market approach. Focusing on the wage hedonic work that has been the dominant influence on environmental policy, there are numerous problems with the conduct and interpretation of these studies (see Dockins et al. 2004 for an excellent, and very complete, review of existing VSL studies and their limitations): (a) Do people perceive low-probability risks at all accurately? Are actuarial risks more or less appropriate to use than perceived risks when the two differ? Are the actuarial risks themselves properly measured (e.g., a common observation is that the National Institute of Occupational Safety and Health (NIOSH) data on risks yields VSL estimates that are substantially higher than those obtained with Bureau of Labor Statistics (BLS) data, and it is likely that there is also substantial risk measurement error within each of these basic data sources)? (b) Is the “marginal” worker in a risky occupation more concerned about risk than the “average” worker? If so, as is likely, the VSL will be biased upward by using the marginal worker’s required compensation. (c) Does the functional specification matter (e.g., linear, ln-linear, Box-Cox, squared terms)? There is little or no theoretical evidence on which functional form is appropriate to apply.

1506

P. E. Graves

(d) Does inclusion or exclusion of other variables affecting wages result in big apparent changes in VSL? For example, risk of non-death injury is likely to be highly correlated with risk of death; omitting the former will bias the latter upward. Black et al. (2003) find the coefficients from the wage hedonic to be highly unstable with respect to both functional form and data selection. (e) Finally, has the Environmental Protection Agency (EPA) ignored potentially important additional concerns? The EPA does not support adjustments to VSL based on how one dies in specific jobs, age, cross-sectional income, non-death risk dread (e.g., cancer), baseline health status, or voluntariness/controllability of risk – yet each of these might be relevant for an individual’s willingness-topay for risk reduction. Trudy Cameron (2010) offers a recent balanced view on the nature of VSL which, among other contributions, suggests that a “less incendiary” terminology than value of statistical life be substituted, perhaps “willingness-to-swap (WTS) alternative goods and services for a micro-risk reduction in the chance of sudden death.” Progress in the estimation of VSL is ongoing, and many of the concerns raised here are being examined in an effort to improve existing VSL estimates, as seen in the Cameron paper. However, a central insight that cannot be escaped is that any policy decision that involves changes in the probability of death inevitably represents an implicit valuation on a statistical life. Explicitly using a specific VSL number is quite likely to lead to better decisions and to decisions that can be analyzed to determine how sensitive the benefit numbers are to alternative assumptions about the magnitude of VSL.

3

Hedonic Valuation of Environmental Quality

There have been many studies using either (or in rare cases both) property value hedonic equations or wage hedonic equations to value environmental quality. In either approach, the dependent variable (either wages or property value) is regressed upon as many causative independent variables as are reasonably available, to which are added variables measuring environmental quality. Numerous readily available review articles have dealt with the many theoretical and econometric issues with the hedonic method (e.g., Palmquist 2005; Taylor 2008), while Graves (2011) presents a simplified verbal and graphical exposition that is accessible to those with widely varying backgrounds. The approach taken here is to provide a “middleground” verbal approach to understanding the hedonic method, an approach that will be seen to clarify some interpretations that are either not widely known or which are ignored in typical studies. As was the case with VSL studies, accurate valuations of environmental improvements in either land or labor markets require that households have “good” (perfect ideally) perceptions of both (i) where it is clean and dirty and (ii) what various levels of environmental quality mean to our health and welfare. Under such strong assumptions, one would expect people to ponder how to avoid risks of death, on

The Hedonic Method for Valuing Environmental Policies and Quality

1507

the one hand, or pollution damages, on the other. The insight that underlies the hedonic approach to environmental valuation is that as long as an individual’s marginal cost of avoiding damages is less than the marginal benefits of avoiding damages, that individual would be expected to continue to avoid damages until marginal costs and benefits are equated. Households can lower pollution damages by either moving to a cleaner town or by moving to a cleaner part of the town they currently occupy. However, since many other movers and non-movers would – other things equal – prefer to occupy cleaner locations, other things cannot remain equal. As will soon become clear, one would expect to observe falling wages and rising housing prices in the clean location until identical households are no better off in a clean location than in a dirty one. While this central idea is straightforward, there is confusion in the details, a confusion this chapter is designed to clarify. We shall take up the labor market approach in Sect. 4, since it follows naturally from the VSL discussion, turning to the property value approach in sect. 5. The only difference between our earlier discussions is that rather than focusing on wage compensation for risks of death on the job, we focus on environmental quality which varies among labor markets, hence should lead to varying levels of wage compensation among those labor markets.

4

Wage Compensation for Environmental Amenities

If City A, one of two otherwise equivalent cities, has higher pollution levels than City B, one would expect residents to move from A to B, reducing the labor supply (raising wages) in A and increasing labor supply (lowering wages) in B – and one would expect this movement to continue until the relatively lower wage in B exactly compensates for the utility value of B’s better environment (as we shall see later, this expectation is not fully correct). One powerful advantage of this approach is that the benefits of environmental cleanup are directly observed in dollar terms, which makes for very convenient comparison to the dollar costs of policies that would result in cleaner cities. Moreover, nonlinearities and synergistic interactions among various pollutant types can readily be explored. This can be easily seen with reference to the following estimation equation: W ¼ α þ βX þ γPM10 þ θðPM10 Þ2 þ δSO2 þ λðSO2 Þ2 þ ηðPM10 SO2 Þ þ ε

(1)

where W is annualized (or hourly) wages, X is a complete vector of traditional wage determinants employed in earning functions in the labor economics literature (education, experience, age, occupation, region, union, etc.), and β is the vector of coefficients on the variables in X. PM10 is particulate matter 10 μm in diameter or smaller, SO2 is sulfur dioxide, and the Greek letters preceding these variables are their respective regression coefficients. The error term, ε, of the regression must meet certain classical regression requirements (iid, no spatial autocorrelation, etc.) with failure to meet those requirements suggesting mis-specification of the regression

1508

P. E. Graves

model. Once a data set, hopefully with many observations, has been amassed and the regression in Eq. (1) has been properly specified and estimated, it is a simple matter to calculate marginal pollution damages: @W=@PM10 ¼ γ þ 2θðPM10 Þ þ ηðSO2 Þ:

(2)

The interpretation of Eq. (2) is quite simple: the first term, γ, is the marginal damage from an incremental change in PM10 under linearity (expected to be positive as discussed earlier); the second term indicates the degree of nonlinearity (e.g., marginal damages are increasing in pollution levels if θ > 0), while the final term indicates the extent to which PM10 damages depend on how much SO2 is present. All of the coefficients would be in convenient dollar form, and to the extent that the second two terms are significantly different from zero, public policy should have pollution standards for any particular pollutant (and economic incentives) that vary with both (a) levels of pollution and (b) levels of other pollutants present. At present, this possibility is completely ignored in environmental policy, and a fruitful line of research would be to delve more deeply into nonlinear and synergistic damages. Since there is very little theoretical guidance on the nature of the appropriate functional form for pollution damages, researchers (inadvertently) and advocates (deliberately) might well distort environmental values by their choices along a number of dimensions (omitting variables that are positively or negatively correlated with the environmental variables, employing a linear model when the data suggest a nonlinear form is more appropriate, etc.). In closing discussion of the wage hedonic approach, it should be reemphasized that this method only works well when people are very aware of both where it is clean and dirty and how working in a clean or dirty location affects them. Bockstael and McConnell (2007), however, in a review of wage studies, find clear evidence that households are willing to give up wages to live in cleaner locations.

5

Property Value Compensation for Environmental Amenities

The property value or rent compensation method employs a virtually identical way of thinking but applies the notion that movements will equilibrate utility within an urban area through adjustments in land values. How much a house will rent or sell for is clearly related to the bundle of positive and negative traits that comprise it. The traits are many: structural (e.g., stone or wood, square footage, number of bathrooms, lot size, type of heat), neighborhood (e.g., school quality, crime rates, access to a wide variety of destinations, notably the central business district in traditional urban models), and – our interest here – environmental quality. Environmental quality is sometimes viewed as a “public good” in the sense that whatever environment exists in an area is essentially unaffected by an individual household’s behavior and that an individual household cannot be excluded from enjoying whatever level of environmental quality exists in that area. The property

The Hedonic Method for Valuing Environmental Policies and Quality

1509

value hedonic method relies on the location specificity of pollution levels – that they vary over space in an urban area – to convert environmental quality into a private good that is “bundled” with housing choice. As with the wage hedonic, assuming that perceptions are “good” (ideally perfect), the value of varying levels of pollution within a city should be captured in property values. The process is quite similar to the wage hedonic approach and can be represented as in Eq. (3): PV ¼ α þ βX þ γPM10 þ θðPM10 Þ2 þ δSO2 þ λðSO2 Þ2 þ ηðPM10 SO2 Þ þ e (3) where PV, property value, is ideally actual sale price rather than listing price, with the only important difference from Eq. (1) being that instead of containing variables affecting wage, the X vector instead is comprised of all structural and neighborhood traits affecting housing value, with the other variables are as defined earlier. The Greek coefficients are the regression coefficients of a properly specified model resulting in an error term with appropriate properties. To find how property values vary in a systematic, functional way with pollution levels, we again partially differentiate Eq. (3) with respect to a pollutant of interest, say particulates: @PV=@PM10 ¼ γ þ 2θðPM10 Þ þ ηðSO2 Þ:

(4)

The interpretation of the coefficients are exactly as before, with γ capturing the linear impact of pollution on property value, 2θ capturing the extent of any nonlinearities, and η testing for synergisms (η > 0 damages are “supra-additive,” while η < 0 damages are “sub-additive”). Krumm and Graves (1982) found a significant positive η indicating synergistic increases in particulate damage, measured by hospital admissions, when more sulfur dioxide is present. As with the wage hedonic, the coefficients give us marginal damages (the benefits of cleaning up) in a very convenient dollar form enabling comparison to marginal provision costs. As with the wage hedonic approach, there is little theoretical guidance as to the nature of the functional relationship between property values or rents and the traits that exert a causative influence, allowing advocates to intentionally publish widely varying results even from identical raw data. Krumm and Graves employed a methodology devised by Zellner and Siow (1980) that, at least in principle, eliminates biases when theoretical guidance on functional form is limited. The potential to deliberately publish biased results is of more than academic concern since there is considerable evidence that estimated property value effects of pollution are not robust to alternative specifications (see Graves et al. (1988) for more in-depth discussion). For either of the wage or property value methods, problems related to either data limitations or assumption of perfect information exist. If, for example, some other disamenities are positively correlated with the pollution measure and those other disamenities are omitted from the equation, the value of the pollution damages will be biased upward. For example, suppose that the more polluted parts of a city are

1510

P. E. Graves

also less desirable for other reasons (more crime, worse schools, more graffiti, street potholes, poorer lighting, fewer parks, etc.) and these other traits are omitted from the equation. By not including the other goods that are correlated with pollution, the impact of pollution will appear to be larger than it is, since the effects of the other non-included variables will be partially attributed to environmental quality (the magnitude of the bias will equal the coefficient on the omitted variable if it were included times the correlation coefficient between that variable and pollution). With constantly improving data acquisition, this problem is becoming less important over time. Since experts argue heatedly about health and other damages and since many pollutants are odorless, colorless, and tasteless in ambient concentrations, it is plausible that households might fail to fully perceive either (a) the impact of pollution on their health and wellbeing, (b) how pollution levels vary over space, or (c) both. To the extent that perceptions are imperfect, one would expect that the hedonic methods would yield pollution damage coefficients that are biased downward, since households would not be expected to be willing to pay for unperceived benefits of cleaner locations. What is the net effect of these potential biases, one suggesting overvaluation and one suggesting undervaluation? Nobody knows the answer to this question with confidence, but a great many property value studies – as was the case with the smaller number of wage differential studies – show strong positive relationship between property values and environmental quality. The property value approach might be thought to be particularly useful for valuing spatially concentrated environmental damages (e.g., toxic waste dumps), and the wage differential approach might seem more appropriate for region-wide amenities (e.g., large pollution clouds or climate). As we shall see in the following section, these beliefs are, generally, quite flawed.

6

Wage and Property Value Hedonics Are Not Alternatives: The Multimarket Hedonic Method

Until fairly recently (new information spreads slowly), the two approaches to valuing pollution damages were viewed as alternative approaches. It was thought that clean air, for example, could be valued either by variation in property values within an urban area or by wage variation between urban areas. Indeed, if the values happened to be similar under the two methods, greater confidence was placed in either as a measure. It turns out that this is incorrect under plausible assumptions about people’s behavior when evaluating locations. Indeed, for this view to be valid, households would have to follow a two-stage procedure when locating – first, looking only at wages, select a labor market, and at a second stage, select a location within that labor market. This would clearly be irrational since households could make better location decisions by looking at the combination of wages, rents, and amenities available in all locations prior to selecting their best location.

The Hedonic Method for Valuing Environmental Policies and Quality

1511

To further clarify, another way to think about this is that, between two otherwise identical locations, the one that is more polluted will be less attractive – so people will move from the more-polluted to the less-polluted location until they are equally well off in both locations. But, as they move into the less-polluted location, they both increase the supply of labor (driving down wages) and increase the demand for housing (driving up property values and rents). Hence, the true value of the lesspolluted locations is the sum of what must be paid for reduced pollution in both the labor and land markets. To many, the argument of the preceding paragraph is not clear or convincing, so additional discussion is useful. Suppose, as a soon to be dropped initial assumption, that the entire world were a flat, featureless plain where all locations are identical. In this scenario, there is no variation in closeness to ocean, scenic views, and the like. Just as there would be no reason to pay more for identical automobiles, there would be no reason to pay more in either land or labor markets for one location over another. Further, as again soon to be dropped, assume that all households have the same preferences and all firms have the same cost functions (and are selling on national markets at one price, hence have the same profit functions in all locations). With these assumptions in place, there would be no variation in demand for lots of different sizes or for hours worked on the part of households nor would there be variation in the relative land/labor intensity on the part of firms. In this simple initial scenario, wages and rents would – in equilibrium – have to be the same in all locations. If, for example, there were a location with higher rents, households would have to be compensated by higher wages or they could not be equally well off there vis-à-vis elsewhere. But, if they are compensated with higher wages, the higher wage/rent location would have to be less profitable than other locations for firms; hence, firms would leave, reducing the demand for labor and indirectly reducing land demand as household employment falls. If a location had lower rents than elsewhere, households would move in until lower equilibrium wages made them indifferent to other locations – but the lower rents and wages would stimulate firm in-migration, until wages and rents were raised to those of other locations. Hence, were the world as boring as the flat, featureless plain and homogeneous household/firm assumptions imply, rents and wages would have to be identical in all locations in equilibrium (see Graves 2011 for a full graphical presentation of this and subsequent discussion). Now let us begin dropping these unrealistic assumptions, first by introducing variation in an amenity that households care about (e.g., a scenic view or lower humidity), but which has no impact on firm profitability. If we are at an initial equilibrium with wage and rent levels equal in all locations, any location possessing more than average amounts of the desirable amenity will be more attractive, hence will lure in-migration of households. But that in-migration will result in increased labor supply along with increased land demand. Hence, the desirable location(s) will have lower wages and higher rents, in some combination that renders – in equilibrium – utility the same in the desirable locations(s) as in the average locations. Similarly, undesirable locations will experience household out-migration at the initial common wage and rent levels, resulting in some combination of lower

1512

P. E. Graves

rents and higher wages in undesirable locations relative to average locations. Note that the compensation paid (for desirable locations) or received (at undesirable locations) represents a measure of “quality of life.” The higher rents and lower wages do not represent a higher “cost of living” in the nice locations but rather a higher “benefit of living” there. The higher benefits of living in the desirable location – as with quality variation among ordinary goods – must be paid for in equilibrium. Hence, were all households homogeneous, there would be in equilibrium no reason to prefer one location over another, despite wide variation in amenity levels, since any gain in amenities would be fully offset by higher rents and lower wages and conversely. Locations that are unusually nice for households will have larger populations than other places. If an amenity affects firm profitability (e.g., access to resource inputs) without having any impact on household utility that will not, in equilibrium, result in greater profits for the firm. Rather, firms will enter driving up land rents directly and indirectly via employment and driving up wages (the latter necessary to compensate households for the higher rents, which is made necessary by the fact that the location is no “nicer” for them). Note that in this case, the higher rents do represent a higher “cost of living” but that higher cost must be completely offset by higher wages. Locations that are unusually nice for firms will, as with household amenities, have larger populations than other places. The preceding two cases lead to nine spatial combinations, with a rich tapestry of possible wage and rent combinations: (a) The “average” location (average wage, W0, average rent, R0, and average size, S0) (b) Nice for households, neutral for firms (lower W, higher R, larger S) (c) Bad for households, neutral for firms (higher W, lower R, smaller S) (d) Nice for firms, neutral for households (higher W, higher R, larger S) (e) Bad for firms, neutral for households (lower W, lower R, smaller S) (f) Nice for both households and firms (ambiguous W, higher R, larger S) (g) Bad for both households and firms (ambiguous W, lower R, smaller S) (h) Bad for households, good for firms (higher W, ambiguous R, ambiguous S) (i) Bad for firms, good for households (lower W, ambiguous R, ambiguous S) Case 9, which we will return to in the following subsection, is of particular interest for environmental policy, since many environmental policies raise the costs of firms but provide benefits to households. Until the early 1980s, most economists believed that imposing stringent controls on firms in a location would result in them leaving that location. This led to fears of a “race to the bottom,” since firms leaving raise unemployment in the short run and firms entering less stringently regulated areas would reduce unemployment in the short run. This presumption was based on a focus on the firm impact, ignoring the impact on households. If the cost increases associated with the environmental policy are relatively small and the household benefits relatively large, the location might well experience growth as it moves to a larger equilibrium size, S. Ignored in, but implicit in, the hedonic discussion to this point is the impact of in-migration and out-migration on what might be called “endogenous amenities and

The Hedonic Method for Valuing Environmental Policies and Quality

1513

disamenities.” That is, if a desirable location for households exists, one would expect in-migration until the lower wages and higher rents rendered that location no more desirable than other locations. But, it is also the case that in-migration might increase levels of endogenous disamenities (e.g., pollution, congestion) or might increase levels of endogenous amenities (e.g., restaurant diversity, local goods with scale economies in production). In a full general equilibrium analysis with all important amenities included, this would not matter because the “net niceness” of the location will still be captured by rents and wages. But data limitations in actual studies are likely to lead to mismeasurement of the value of amenities. If, for example, measures of increased cultural opportunities or restaurant quality and diversity are positively correlated with the amenity but are omitted from the hedonic estimates, then the value of the amenity will be overstated by the wage/rent differentials observed – the coefficient on the amenity variable will be larger by the omitted variables’ effect times the correlation with the amenity. Similarly, if increased congestion is positively correlated with the amenity but was omitted from the equation, then the wage/rent variation would understate the value of the amenity. In the environmental context, increases in pollution and congestion are both likely consequences of movements to desirable – perhaps because of better climate – locations, but if changes in congestion are omitted from the estimating equation, the pollution variable will pick up the effect of congestion times its correlation with pollution. A common criticism of the underlying assumptions of the hedonic method is that households and firms, especially the latter, might have very high movement costs; hence, disequilibrium might persist for very long periods. If this is the case, then observed wage and rent differentials would not be entirely compensatory. That is, high-wage places might be “high-utility” places because more of all goods could be consumed there, while high-rent places might be “low-utility” places because fewer goods could be consumed in such high-cost locations. A couple of observations are pertinent to this issue. First, it might not take too many people or firms actually moving to yield a close approximation to a fullmobility equilibrium. This is analogous to the fact that only a few drivers need to move from “slow lanes” to “fast lanes” on a freeway at rush hour to make all lanes equally fast. Second, as an empirical matter, in recent decades, households have been moving toward high-rent locations and toward low-wage locations. With rising nationwide incomes, this trend is consistent with an equilibrium in which desirable locations are also normal or superior goods (i.e., at higher national incomes, there is even greater demand for the already desirable subnational locations).

7

What if Single-Market Hedonic Analyses Are Employed Rather than Multimarket Analyses?

Very few multimarket hedonic analyses have been conducted (for an early contribution, see Blomquist et al. 1988), while a very large number of hedonic analyses have been conducted in either the labor or land markets considered separately. What are the implications for environmental valuation of using a rent hedonic or a wage

1514

P. E. Graves

hedonic rather than the combined analysis implied by prior discussion? The taxonomy of household/firm amenity combinations, 1 through 9 above, has clear implications for valuation biases introduced by failure to consider both markets. We will focus on the policy-relevant case where one is attempting to determine the value of environmental quality to households (to use that information to infer benefits to be compared to costs in environmental benefit-cost analysis). Many environmental policies tend to be applied uniformly over space, but that does not, in general, mean that their benefits and costs are uniformly distributed over space. For example, required catalytic convertors on automobiles raised costs in rough proportion to population, but the benefits of that policy would be much higher in locations (e.g., Los Angeles, Phoenix, Denver) that are sunny and warm and/or have stagnant air conditions. Hence, a uniform policy can have pronounced effects in making locations such as Los Angeles relatively more attractive than they would otherwise be, encouraging in-migration and resulting higher land values and lower wage rates. In cases such as this, where there are negligible impacts on local firms, it would clearly be the case that using either a property value hedonic or a wage differential hedonic in isolation would result in underestimation of environmental benefits relative to a multimarket approach. The extent of bias will depend on the relative capitalization rates and which of the two hedonic methods are chosen – if most of the impact of the policy goes into wages, using a property value approach will greatly underestimate benefits, and conversely. Other more location-specific environmental policies, such as Pittsburgh introducing controls on steel polluters in the 1950s prior to nationwide control policies, will have direct impacts on both local firms (harmful) and local households (favorable). The harm to firms would lead to lower demand for labor, while the desirable impacts on households would lead to an increase in the supply of labor. Both of these effects cause wages to fall, but the net effect on property values/rents is ambiguous, depending on whether the city gets larger or smaller as a result of the policy. In such cases, a wage hedonic is much more likely to accurately value the environmental improvement than would a rent hedonic, the latter falsely implying little or no environmental benefits from the policy. As a “fluke,” it might be that the wage hedonic picks up the full value of the environmental policy, but in general, adding the information from a property value study would lead to more accurate estimates. It should be noted that the compensation shares are not limited to [0,1], but rather more than 100% of the benefits could go into wages and the rent compensation could actually be negative (e.g., if the environmental policy harmed firms very much, so that the city got smaller, with lower rents in equilibrium). If a property value study is used in this case, it would seem like the environmental improvement had negative value! If an environmental policy at a location is good for both households and firms, both would want to move in. Suppose, for example, that a nationwide law is passed that subsidizes firms to clean up in areas where there is non-attainment of air pollution standards, with no subsidy in areas meeting current air pollution standards. This situation would cause rents to rise in locations subject to the policy with an ambiguous impact on wages. Wages would rise if the policy benefited firms

The Hedonic Method for Valuing Environmental Policies and Quality

1515

relatively more than households, while wages would fall if the policy benefited households relatively more than firms. In this case, the value of the environmental amenity would appear largely in land markets, but again only as a fluke would there be no labor market effects. In this case, the choice of a hedonic wage analysis will greatly understate the value of the cleaner air. It has been assumed to this point that all households and all firms are homogeneous. This is of course not the case in realistic settings. Land-intensive firms would not be expected to be found in locations where land is expensive (which is why corn is not seen growing in downtown New York City). Similarly, those households who have unusually large preferences for land, perhaps those with large families or pronounced gardening desires, would not locate where land is very expensive, perhaps locating in suburbs or exurbia rather than in more central areas. If a firm’s labor demands are unusually large, it would avoid locations with unusually high wages. If a household does not supply labor (e.g., the retired as discussed in Graves and Waldman 1991), they would want to locate where amenities are mostly paid for in wages rather than rents. This would also be the case for those who have very high demand for services. Conversely, those households that supply low-skilled labor to service industries are likely to be priced out of very desirable and high-rent locations (e.g., Malibu, CA; Aspen, CO; or Key West, FL) and will have to be compensated in higher wages to locate there or commute in to work, that is, the low-skilled may actually have higher wages in desirable locations. As the preceding discussion makes clear and as even casual reference to the real world verifies, there is a very rich tapestry of locational choices when the full implications of the role of firm and household amenities are considered. This is even more the case when endogenous amenities are considered, amenities such as the amount of similar people present in a community (e.g., the ethnic neighborhoods of large cities that often make them much more attractive to particular types of people than would otherwise be the case). Summarizing, there are five reasons why hedonic methods are likely to understate the value of environmental quality improvements. The first, and most obviously damaging, is that the benefits of environmental quality must be fully perceived by households for them to be willing to pay more for cleaner locations. As mentioned earlier, even the world’s foremost health experts have spirited debates about the role various pollutants play in human disease and death. It seems very implausible that ordinary people would be able to accurately perceive such things. Additionally, many pollutants are odorless, colorless, and tasteless in normal ambient concentrations; hence, ordinary people might be unable to distinguish the clean places from others. It is unlikely then that many important environmental effects would be capitalized into property values. Why do hedonic studies show such large environmental effects then? It is certainly the case that people will perceive localized smells, bad visibility, and other impacts of pollution that are inevitably revealed by our five senses. Yet, it is precisely such perceived damages that are ignored in the sum of specific damages approach (sometimes called the “health effects” or “averting behavior” approach) which is often used in environmental policy analysis. A good argument – the second reason why hedonic methods understate environmental values – could be made for

1516

P. E. Graves

adding the damages estimated via sum of specific damages (lives saved, reduced asthma attacks, etc.) to those estimated via hedonic methods. This follows from the fact that the damage categories measured by the two methods exhibit very little overlap – damages that are perceived would be expected to go into property values and wage differentials, while damages that are unperceived would be measured by the sum of specific damages approach. The third reason – discussed at length earlier – for why hedonic methods are likely to understate environmental values is that it is still the case that separate analyses in labor or land markets are still the norm, when it has been known for several decades now that only a multimarket hedonic can accurately capture the full value of the environment. The circumstances under which a single-market analysis could accurately value an environmental amenity are extremely rare (e.g., a fixed housing stock, a retired population). The fourth reason for expecting the hedonic method to understate true benefits is that the hedonic method, even properly conducted, only captures use benefits of the environmental resources of concern, since the amenities are bundled with housing and jobs. Nonuse benefits might well be of greater magnitude in particular environmental settings, and policies allocating the environmental resource should, on efficiency grounds, encourage highest value usage even if that results in nonuse of the environmental resources. Illustrating, is the California Coastal Commission properly allocating scarce ocean locations? It is clear that in the absence of this regulatory authority, virtually the entire coast of California would be lined with high-rise condos, looking much more like Miami than at present. But, the scenic Pacific Coast Highway has value to all who drive it, and to a large extent, that value has been perceived as being of greater importance than the (admittedly very large) benefits households would receive if the coast were opened to unrestricted development. The final reason why hedonic methods might be expected to understate the benefits of environmental cleanup stems from the supplies of clean locations relative to the demands for clean locations. The hedonic method results, at least in principle, in zero spatial consumer surplus for similar households. That is, if one location is nicer than another location, households will continue to move to the nicer location, until it is no longer nicer, until identical locations have identical prices. There will be no consumer surplus over space, and indeed, this is one of the reasons the hedonic method is desirable in that the full benefits that are perceived are measured. But the fact that people are very different means that understatement of environmental benefits (damage reduction) can occur if there are more locations with the amenity than there are people strongly desiring the amenity. Suppose, for example, that there are very few households containing really unhealthy individuals, individuals with weakened cardiopulmonary systems who would be highly damaged by pollution. Such households might be willing to pay a great deal for a very clean location, but they might only have to pay a much smaller amount, if the number of somewhat clean locations is large relative to the number of these households.

The Hedonic Method for Valuing Environmental Policies and Quality

1517

They will get, in other words, consumer’s surplus over space. Inferring the value of cleaning up the environment from the average person in this case would ignore the high marginal benefits received by these households. As another illustration of the potential importance of this point, a hedonic analysis of a large city might suggest that its mass transit system has low value, because those who have the greatest use value (e.g., the disabled or those who particularly dislike automobile commuting) may only have to pay a small portion of their true willingness-to-pay in land or labor markets. When one considers the very large number of traits that can matter to a heterogeneous population with very diverse preferences, it becomes clear that a great deal of consumer surplus can remain in the hedonic equilibrium. In the case of incrementable environmental goods, the unobserved consumer surplus corresponds to a higher marginal value that might if observed justify a policy intervention to increase levels of the public good. The hedonic method is quite popular due to its ability to provide a convenient dollar measure of marginal environmental damages (damage reduction being the benefit of environmental cleanup policies). The limitations discussed here imply that there is a great deal of room for improvement in this method and raise issues of how best to get at the total marginal benefits, measured in all markets that households have to pay in.

8

Conclusion

The goal of this chapter was to describe the hedonic method as a means of valuing environmental quality improvements. The hedonic approach requires very good, ideally perfect, perceptions ofenvironmental benefits (or risk in the VSL case) along with good/perfect knowledge of how environmental quality varies over space (or risk over jobs in the VSL case). This assumption is highly suspect in many settings. Moreover, it remains the case that expert legal testimony and typical regulatory practice still commonly employ either a property value study or a wage study, despite our having known for more than two decades that compensation for environmental amenities and disamenities will generally occur in both the land and labor markets. The extent to which damages appear in land versus labor markets would generally vary according to many things, but considering either market separately is likely to greatly underestimate the damages from pollution. If an environmental pollutant were highly concentrated (e.g., a hazardous waste dump), one would expect a greater percentage of its damage to appear in property values, while the damages from more regionally ubiquitous pollutants might be expected to appear primarily in wage rates. The existence of firm amenities and disamenities complicates the ability to establish general conclusions, however, but it remains the case that using only one of the two markets in which environmental quality is valued generally results in understatement of environmental values.

1518

P. E. Graves

References Black DA, Galdo J, Liu L (2003) How robust are hedonic wage estimates of the price of risk? Final report to the USEPA (R 829-43-001) Blomquist GC, Berger G, Hoehn J (1988) New estimates of the quality of life in urban areas. Am Econ Rev 78(1):89–107 Bockstael NE, McConnell KE (2007) Hedonic wage analysis. In: Environmental and resource valuation with revealed preferences: a theoretical guide to empirical models. The economics of non-market goods and resources, vol 7. Springer, New York, pp 151–187 Cameron T (2010) Euthanizing the value of a statistical life. Rev Environ Econ Policy 4(2):161–178 Court AT (1939) Hedonic price indexes with automotive examples. In: The dynamics of automobile demand. General Motors Corporation, New York, pp 99–117 Dockins C, Maguire K, Simon N, Sullivan M (2004) Value of statistical life analysis and environmental policy: a white paper, U.S. Environmental Protection Agency, National Center for Environmental Economics, April 21. For presentation to Science Advisory Board, Environmental Economics Advisory Committee Graves PE (2011) The hedonic method: value of statistical life, wage compensation, and property value compensation. In: Batabyal A, Nijkamp P (eds) Chapter 6 of Research tools in natural resource and environmental economics. World Scientific, Singapore, pp 187–213 Graves PE, Waldman DW (1991) Multimarket amenity compensation and the behavior of the elderly. Am Econ Rev 81(5):1374–1381 Graves PE, Murdoch JC, Thayer MA (1988) The robustness of hedonic price estimation: urban air quality. Land Econ 64(3):220–233 Griliches Z (1961) Hedonic price indexes for automobiles: an econometric analysis of quality change. In: The price statistics of the federal government. NBER staff report no. 3, General series no. 73. NBER, New York, pp 173–196 Krumm R, Graves PE (1982) Morbidity and pollution. J Environ Econ Manag 9(4):311–327 Palmquist RB (2005) Property value models. In: Maler K-G, Vincent JR (eds) Handbook of environmental economics: valuing environmental changes, vol 2. North Holland, Amsterdam, pp 763–819 Ridker RG, Henning JA (1967) The determinants of property values with special reference to air pollution. Rev Econ Stat 49(2):246–257 Roback J (1982) Wages, rents, and the quality of life. J Polit Econ 90(6):1257–1278 Rosen S (1974) Hedonic prices and implicit markets: product differentiation in pure competition. J Polit Econ 82(1):34–55 Taylor LO (2008) Theoretical foundations and empirical developments in hedonic modeling. In: Baranzini A, Ramirez J, Schraerer C, Thalmann P (eds) Hedonic methods in housing markets. Springer, New York, pp 15–38 Vail EE (1932) Retail Prices of Fertilizer Materials and Mixed Fertilizer, AES Bulletin 545, Ithaca, N.Y.: Cornell University Waugh FW (1928) Quality factors influencing vegetable prices. J Farm Econ 10(2):185–196 Zellner A, Siow A (1980) Posterior odds ratios for selected regression hypotheses. In: Bernardo JM, Degroot MH, Lindley DV, AFM S (eds) Bayesian statistics. Proceedings of the first international meeting held in Valencia. University of Valencia Press, Valencia, pp 585–603

Materials Balance Models Gara Villalba Méndez and Laura Talens Peiró

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Material Balance by Total Mass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Material Balance by Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Applications of Material Flow Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Studying Flows of Substances, Materials, and Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Studying Firms, Sectors, and Geographical Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Material Balance Applied to Chemical Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Mass Balance Applied to Rare Earth Metals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 The Laws of Thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1520 1521 1521 1522 1522 1523 1526 1526 1528 1535 1536 1536

Abstract

This chapter presents an overview of the mass balance principle and its applications. It is an important tool for quantifying wastes which are produced by economic processes. These wastes are equal in mass to the difference between total raw material inputs to the process and useful material outputs. Products are becoming more complex which results in an increase of input mass and wastes. It is safe to say that nowadays process wastes far exceed the mass of materials that are finally embodied in useful products. The application of the mass balance principle can take many shapes and forms, and this chapter illustrates a few. Using mass balance and chemical engineering G. Villalba Méndez (*) Universitat Autònoma de Barcelona, Bellaterra, Spain e-mail: [email protected] L. Talens Peiró Social Innovation Centre, INSEAD, Fontainebleau, France e-mail: [email protected]; [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_60

1519

1520

G. Villalba Méndez and L. Talens Peiró

knowledge of processes, we found that on a yearly basis, the inorganic chemical industry has a yield of 91% (9% of the inputs end up in waste), and the organic chemical industry has a yield of 40%. A second example is the rare earth metal industry, where potential recovery of these scarce metals is quantified to motivate reuse and recycling. Presently less than 1% of rare earth metals are recovered from end-of-life products, but as the demand for these resources increases in the near future for products such as electric motors and wind power turbines, recovery will become necessary. An introduction to thermodynamics and exergy is included, since all wastes are thermodynamically degraded as compared to raw materials. The exergy of the inputs, products, and wastes is an important factor to consider for process efficiency and environmental evaluation. Keywords

Life cycle assessment · Wind turbine · Rare earth metal · Liquid crystal display · Caustic soda

1

Introduction

We constantly perform material flow analysis in our daily activities without realizing it. For example, when we balance our checking account, we sum the money we are crediting to our account to the current balance and subtract our expenses. Unknowingly we are applying one of the most fundamental principles that govern our existence: the mass balance principle that states that mass cannot be created nor destroyed. This physical law has nontrivial consequences in economics. Economic processes require inputs, both energy and matter, and invariably generate waste. Economic products are becoming more and more complex requiring many times the materials and energy that are finally embodied in the product, resulting in many waste streams to land, air, and soil. A cellular phone 10 years ago required in the order of 20 different materials such as different plastics, copper, aluminum, and steel. Nowadays, a multifunctional mobile phone can have as many as one thousand different kinds of materials (Mueller et al. 2003). Eventually the products become waste themselves. This leads us to the idea of analyzing the material life cycle: a more comprehensive assessment of resource use and wastes also referred to as a “life cycle analysis” approach. The following equation represents the mass balance principle: Mass accumulation within the system ¼ Mass input through system boundaries Mass output through system boundaries þ Mass generation within the system Mass Consumption within the system The mass consumption and generation terms are associated to transformation due to chemical reactions. If there are no chemical reactions taking place, then these two terms are zero. If we assume steady state conditions, there is no accumulation of mass and the equation is further simplified.

Materials Balance Models

1521

a

b

H2O 0.19 O2 S

0.50

SULFURIC ACID PRODUCTION

1.00

O H2SO4

0.33

H S

0.67 0.02

0.66 SULFURIC ACID PRODUCTION

0.33

0.32

0.02 EMISSIONS

0.02

0.01 O

O H S

0.01 S

Fig. 1 Sulfuric acid production: (a) mass balance by chemical compound and (b) mass balance by chemical element

There are two basic approaches that can be used to carry out material balance. The analysis can be performed based on (a) the total mass in each stream entering and leaving the system and/or (b) the composition of each stream entering/leaving the system. Figure 1 represents the production of sulfuric acid (H2SO4) used to illustrate both approaches. In general terms, sulfuric acid production consists of a series of chemical reactions in which water, oxygen, hydrogen, and sulfur are needed and emissions such as SO2 result as waste. Energy is needed in order to drive this process which consequently results in CO2 emissions, useful work, and heat loss – but for simplicity we will not consider energy terms.

1.1

Material Balance by Total Mass

If we know the total amount (in kg) of sulfuric acid produced and the inputs required, we can easily calculate the amount of emissions that result from this process. This is illustrated by Fig. 1a for the production of 1 kg of H2SO4 and Eq. (1) applied to this case: Total input ¼ product þ waste ð190 g H2 O þ 500 g O2 þ 330 g SÞ  1000 g H2 SO4 ¼ emissions ¼ 20 g emissions

1.2

(1)

Material Balance by Element

If we want to know the composition of emissions, we could perform a type (b) analysis where a mass balance is performed element by element. Since the composition of the product is known, we can calculate the composition of the waste using the molecular weight (MW) of each element. This is illustrated in Fig. 1b: there is an input of 670 g of O which comes from the water and O2. We also know that the product output is 1 kg of H2SO4 which is equivalent to 10.2 moles of H2SO4, 40.2 moles of O, or 650 g of O. Applying Eq. (2) by element gives the following:

1522

G. Villalba Méndez and L. Talens Peiró

Total O input ¼ O in product þ O in waste 670 g O  650 g O in product ¼ O in waste ¼ 10 g O:

(2)

So now we know that of the 20 g of emissions calculated using approach (a), 10 g is of oxygen. This is just based on some simple calculations, but if we were to use simulation software that also takes thermodynamics into consideration, we could in theory know in what compound and in what state that oxygen ends up as in the waste stream. To summarize, the MFA procedure follows these steps: (i) Define the process under study and the system boundaries, both spatially and temporally. (ii) Label all flows, inputs, outputs, and accumulation. (iii) Identify all known values of composition and stream flows. (iv) List all independent mass balances that are possible. Sometimes assumptions must be made when not sufficient data is available. (v) Solve the equations for unknown variables.

2

Applications of Material Flow Analysis

The field of industrial ecology is well known for evaluating industrial systems based on material flow analysis. The industrial system under study can be at any level: the most encompassing one being global and the simplest being a single manufacturing process such as the sulfuric acid production illustrated earlier. The basic approach is that the system to be analyzed is viewed as a transformation process that requires certain inputs such as material, energy, and “free goods” from the environment. These are converted to products, by-products, and wastes that can be airborne, liquid, or solid. In other words, the system “digests” raw materials into products and the whole process is referred to as “industrial metabolism.” Thanks to the mass balance principle, wastes can be calculated if we know the sum of the inputs and useful outputs. There are different types of MFA depending on what system needs to be evaluated and what the objectives are. Figure 2 summarizes the types of material flowrelated analysis that can be done. Basically, MFA is divided into two types: (a) for studying a specific environmental problem related to certain impacts per unit of flow of substances, materials, and products within certain firms, sectors, and regions and (b) for analyzing problems of environmental concern related to the throughput of firms, sectors, and regions associated with substances, materials, and products.

2.1

Studying Flows of Substances, Materials, and Products

The main purpose of studying flows of substances, materials, and products through a system is to clearly define the metabolism of the material or system under study. For

Materials Balance Models

1523

Types of material flow-related analysis Specific environmental problems related to certain impacts per unit flow of: Substances: E.g.CO2, N, P, C Cd, CI, Pb, Zn, Hg, etc...

Materials: e.g. energy carriers, biomass, plastics, etc...

Products: e.g. batteries, cars, buildings, etc..

Within certain firms, sectors, regions

Problems of environmental concern related to the throughput of: Firms: E.g. single plants, medium and large companies

Sectors: e.g. production sectors, chemical industry, construction, etc...

Regions: e.g. mass flow balance, total material requirement

Associated with substances, materials, products.

Fig. 2 Types of material flow-related analysis. (Source: Adapted from Bringezu and Moriguchi 2002)

example, the study of major chlorine-based chemicals defines the chlorine metabolism throughout the economy and also identifies other production processes where chlorine (Cl2) is used to produce intermediate products such as caustic soda (NaCl) or final products such as paper (Ayres and Ayres 1998). Figure 3 shows the material flows in the production of major chlorine-based chemicals in the USA in the year 1993. This study quantifies all chlorine wastes and emissions during production and also identifies the most intensive chlorine-consuming sectors, in this case the organic synthetics (11.15 million tons) and paper and pulp mills (4.14 million tons). It was useful to quantify chlorine losses because of its toxic potential and various pollution problems, the ozone depleting effect of chlorofluorocarbons (CFCs) and the risks incurred through the incineration of materials such as polyvinylchloride (PVC). This type of MFA, also referred to as substance flow analysis (SFA), determines the main entrance routes of chlorine to industry which is useful for qualitatively assessing risks to substance-specific endpoints (Van der Voet 2002).

2.2

Studying Firms, Sectors, and Geographical Areas

MFAs can also be applied to firms, sectors, or geographical areas, to evaluate their environmental performance. For example, by performing an MFA to a firm producing chlorine (Cl2), we can calculate material requirement for its production (resource depletion) and the wastes and emissions per tonne of chlorine produced. Figure 4 illustrates these figures for the production of 1 t of chlorine (Cl2) by electrolysis of

0.52

18.36

waste 0.02

KOH

10.65

0.3

11.6

0.54

0.38

11.14

hydrogen H2

0.25

0.24

imports

ELECTROLYSIS

H2O 0.12

waste brine 1.4

Cl2, NaOH

ELECTROLYSIS

H2O 5.6

NaClO3

ELECTROLYSIS

caustic potash

chlorine (elemental) Cl2

caustic 2.35 soda NaOH 0.2

sodium chlorate

1.12

Bayer

Other miscellaneous

Other miscellaneous

0.57

0.12

6.75

4.4

0.38

1.22

1.9

0.46

0.02

1.21

2.39

0.54

0.23

WATER TREATMENT SIC 4941

ORGANIC SYNTHETICS SIC 28

OTHER INORGANIC CHEMICALS (except BAYER) SIC 2819

INORGANIC PIGMENTS (TiO2) SIC 2816

PAPER AND PULP MILLS SIC 261

PHOSPHATE MINING SIC 1475

Fig. 3 Material flows in the production of major chlorine-based chemicals in the USA, 1993 (tonnes). (Source: Ayres and Ayres 1998)

potash KCl

salt NaCl

18.66

0.3

O2 0.24

NON-FERROUS METALS MINING SIC 10

1524 G. Villalba Méndez and L. Talens Peiró

Materials Balance Models

1525

Wastes are pretreatment & pre-reprocessing Cleaning & cooling water not included

electricity 11,512 mJ heat 2,683 mJ

chloride 1,711kg

total 2,311 kg

other chemicals 102 kg water 499 kg

MATERIAL INPUTS

sodium

PRODUCTION OF 1,000 kg CHLORINE (1,118 kg CAUSTIC SODA AS ASSOCIATED CO-PRODUCT)

MATERIAL OUTPUTS

UTILITIES INPUTS

chlorine 1,000 kg total 2,155 kg caustic soda 1,118 kg

By-products 38 kg

WASTES & LOSSES waste mass 156 kg

Fig. 4 Chlorine by electrolysis of sodium hydroxide (NaOH) in mercury cells. (Source: Adapted from Ayres and Ayres 1998)

sodium chlorine (NaCl). The inputs for this process are as follows: 1,711 kg of sodium chlorine (NaCl), 499 kg of water (H2O), and 102 kg of other chemicals which are not specified. As a result, 156 kg of waste is produced. When applying MFA to a product or firm, a life cycle approach is normally taken and is denoted life cycle assessment (LCA). An LCA accounts the inflows and outflows of the system “from cradle to grave”; that includes all material inputs and outputs from extraction, manufacturing, consumption or use, recycling, and final disposal. The initial interest in developing LCA was to minimize the energy consumption and solve the waste management problems. The first LCA project, originally called REPA (Resource and Environmental Profile Analysis), was carried out by the Midwest Research Institute for the Coca-Cola Company in 1969. The goal was to compare several container options by quantifying emissions, material, and energy consumption of each. Presently, LCA is standardized by the International Standard Organization on the ISO14,040 series and Life Cycle Initiative program led by the United Nations Environment Programme (UNEP) and the Society for Environmental Toxicology and Chemistry (SETAC) created to develop and disseminate practical tools for evaluating the opportunities, risks, and trade-offs, associated with products and services over their whole life cycle (Mila i Canals 2003). Section 3 illustrates different MFA approaches. First in Sect. 3.1, an analysis of the inorganic and organic chemical industry is given to quantify wastes and process conversion. Section 3.2 shows a material flow analysis of rare earth metals by current market demand. This is useful in order to quantify potential recovery of these critical metals in the waste streams.

1526

G. Villalba Méndez and L. Talens Peiró

3

Case Studies

3.1

Material Balance Applied to Chemical Industry

Quantitative data about industrial chemicals can be estimated with reasonable accuracy from industry production statistics. In the USA, production statistics were published annually by the US International Trade Commission (USITC) until the mid-1990s. The USITC reports included production data for virtually all industrial chemicals, including intermediates. Hence, in this example, we use production statistics from USITC for years 1991–1993. Unfortunately, these reports are no longer published. To compare inputs and outputs for the whole sector and avoid double counting, the list can be divided into two groups: (i) basic chemicals, which are made directly from raw materials, and (ii) all others, including intermediates. For such classification, some knowledge of the industry is required. For instance, sulfuric acid is mainly made by burning sulfur but is now also produced as a by-product of copper smelting. Hydrochloric acid is no longer made from salt but as by-product of many downstream chlorination processes and thus is not considered a “basic” chemical. Based on process information of the basic inorganic and organic chemicals, raw material inputs are quantified in mass terms, whence a material balance by elements (C, H, O, N, Cl, S, Na, Ca, etc.) can be performed. The difference between mass inputs and useful outputs is wastes and emissions. For the industry as a whole, wastes are characterized by elemental composition, but they can be estimated approximately as a mix of compounds (CO2, CO, H2O, NaCl, CaSO4, etc.) based on knowledge of process reactions.

3.1.1 Inorganic Chemicals Based on process information, the basic inorganic chemicals included are sulfuric acid, ammonia, chlorine, and caustic soda. The total production of these four chemicals represents 75% of the total mass of inorganic chemical production (Ayres and Ayres 1999). Once the outputs are identified, we need to estimate the inputs. Mass inputs are identified based on the theoretical reaction for the production of each inorganic product.

Sulfuric Acid Sulfuric acid can be produced from a large number of raw materials: crude oil and natural gas, copper and lead-zinc ores, organic spent acids, sulfur-containing gases, and sulfur salts. However, in practice, it is mainly produced from sulfur, oxygen, and water by single-/double-contact absorption – when sulfur has been purified and dried – and wet/combined dry-wet catalysis – when sulfur originates from the burning or the catalytic conversion of hydrogen sulfide (H2S) gases – as illustrated in the three reactions below:

Materials Balance Models

1527

Reaction a 2S þ 2O2 ! 2SO2 Reaction b 2SO2 þ O2 ! 2SO3 Reaction c 2SO3 þ 2H2 O ! 2H2 SO4 Ammonia Ammonia is a starting material used in a wide variety of industrial chemicals and nitrogen fertilizers. The latter are responsible for 90% of all ammonia production to obtain urea, ammonium nitrate, ammonium sulfate, and ammonium phosphates (Suresh and Fujita 2007). Most of the world’s ammonia is produced from natural gas by steam reforming, except in China where ammonia is produced from synthesizing gas from coal. Steam reforming involves two main reactions: the separation of hydrogen from methane (Reaction a) and its recombination with atmospheric nitrogen (Reaction c): Reaction a CH4 þ H2 O ! CO þ 3H2 Reaction b CO þ H2 O ! CO2 þ H2 4 8 Reaction c 4H2 þ N2 ! NH3 3 3 Chlorine and Sodium Hydroxide Chlorine and sodium hydroxide are produced as coproducts by electrolytic decomposition of sodium chloride solutions obtained from brines. The electrolysis of chlorine consists on using direct electric current to drive chemical reactions, in this case to dissociate sodium chloride in sodium cations and chlorine anions (Bommaraju et al. 2000). During the electrolysis process, chlorine anions are oxidized at the anode to produce chlorine and sodium cations with hydroxyl anions from water form sodium hydroxide at the cathode. Besides chlorine and sodium hydroxide, hydrogen is also generated (as illustrated in the reaction below): 2NaCl þ 2H2 O ! Cl2 þ 2NaOH þ H2 Results The overall mass inputs for the production of the main inorganic chemicals are hydrogen, methane, nitrogen, oxygen, sodium chloride, and sulfur. Figure 5 shows the elemental and component mass balance for the production of basic inorganic chemicals in the USA in 1991. Performing a mass balance by elements and components helps ensure the consistency of inputs and outputs. Mass inputs are estimated based on the production statistics of end-products and the reactions illustrated above. The mass balance shows that about 9% of the total mass inputs are wasted as compounds made of carbon, chlorine, sodium, and sulfur. The process conversion or yield can be estimated by dividing the mass output of the products by that of the inputs. For inorganic chemicals, such conversion equals 91%.

1528

G. Villalba Méndez and L. Talens Peiró Mass inputs 49.34

Mass inputs 49.34 N2 H2 O2 2.17 33.05 14.12

H O N 2.17 33.05 14.12 C 6.17

Mass inputs 42.48

H 2.07

H 4.24

Cl 12.07 S 14.34

Cl 11.57 S 14.21

Na 7.83

Inorganic Chemicals

Na 6.73 O 33.05

Cl2 11.57

NaCl 19.90 S Mass Mass 14.34 outputs inputs 42.48 83.92 CH4 8.24

H2SO4 43.47

Inorganic Chemicals

NH3 17.17

Mass outputs 83.92

NaOH 11.71

N 14.12 S C Cl Na 6.17 0.50 0.13 1.10 Mass waste 7.90

Mass waste (C,Cl,Na,S) 7.90

Fig. 5 Mass balance by elements and compounds of the production of inorganic chemicals in 1991 (MMT). (Source: Ayres et al. 2011)

3.1.2 Organic Chemicals Based on USITC statistics for 1991–1993, the major end products from organic chemicals are plastics (as polyethylene, polypropylene, polystyrene, and polyvinyl chloride), nylon 6, ethylene glycol (antifreeze), and methyl tert-butyl-ether (A fuel additive that has been largely phased out since 1991). The production of these chemicals, in mass terms, represented about 80% of total US production of organics in 1991 (Ayres and Ayres 1998). Basic organic inputs are hydrocarbons and also inorganic raw materials as chlorine, sulfuric acid, ammonia, and caustic soda. Most organic chemicals are produced from feedstocks from natural gas or petroleum refineries, with a very small share from coal. There are three categories of feedstock: paraffin, olefins, and cyclic/aromatics. Paraffins are saturated straight- or branched-chain hydrocarbons. Examples include methane, ethane, propane, isobutene, and n-butane. Olefins are unsaturated aliphatic compounds with one or more double bonds. Examples include ethylene, propylene, butylenes, and butadiene. Cyclic aromatics are benzene, toluene, xylene, cyclopentene, cyclohexane, and naphthalene. Table 1 shows the organic chemicals and their primary production. The mass input of the inorganic raw materials chlorine, sulfuric acid, ammonia, and caustic soda is estimated based on their use patterns (Ayres and Ayres 1998). Figure 6 illustrates material balance by elements and components for the production of the listed basic organic chemicals. The mass balance shows that about 60% of the total mass inputs are wasted as compounds of carbon, hydrogen, oxygen, nitrogen, chlorine, sodium, and sulfur. The process conversion or yield for organic chemicals is about 40%.

3.2

Mass Balance Applied to Rare Earth Metals

A material flow analysis (MFA) helps determine the main entrance routes of rare earth (RE) metals to the economic system. Such quantification is useful to quantify

Materials Balance Models

1529

Table 1 US primary feedstock production in 1991 (MMT). (Source: Ayres and Ayres 1998) Organic chemicals Aliphatics and olefins C1 Methane C2 Acetylene Ethylene C3 Propylene C4 Butylene Butadiene Butene Isobutane Isobutylene Other C4 C5 Isoprene Pentene, mixed Other C5 Cx All other aliphatics (including methane) Aromatics and naphthenes Benzene, all grades Toluene, all grades Xylenes, all grades All other aromatics and naphthenes Total inputs

Formula

Mass (MMT)

CH4 C2H2 C2H4 C3H6 C4H8 C4H6 C4H8 C4H10 C4H8 C4H8 C5H8 C5H10

3.60 0.14 18.12 9.77 1.05 1.39 0.43 0.50 0.44 2.54 0.21 0.19 1.29 5.56

C6H6 C7H8 C8H10

5.21 2.86 2.87 1.54 57.70

future potential recovery of these critical metals in end products once they reach their end of life. The amount of RE metals in intermediate and end products can be estimated based on the production and market share of each metals (Kingsnorth 2009; Chegwidden and Kingsnorth 2010; Morgan 2010; Schüler et al. 2011). The main functions of RE are as dopants for semiconductors, catalysis, electricity storage, alloying elements, additive in glass and ceramic, and as abrasives. All these functions are used in intermediates and end products. For example, lanthanum is used as dopant for semiconductors in phosphors (intermediate product), and phosphors are used in liquid crystal display, plasma flat panels, and lighting, all of them end products.

3.2.1 Dopants for Semiconductor in Phosphors The principal applications for RE phosphors are in display screens (cathode ray, liquid crystal, and plasma) and in low-energy fluorescent lighting tubes. Each of the different display technologies requires different types and compositions of phosphors, as do fluorescent tubes in which the phosphors reduce energy consumption and provide specific colors. Phosphors consist of a host material with an added activator or dopant. For red, the RE oxides used include yttrium, europium, and gadolinium. For green, the hosts used are lanthanum, cerium, and yttrium, while terbium and gadolinium are used as activators. For blue, europium oxide is mainly used. The combination of red, green, and blue gives white color. In 2010, 8,250 t of

1530

G. Villalba Méndez and L. Talens Peiró

C 52.00 H 9.91 Cl 6.83 Mass input 73.45

S 0.99 O 3.30 Na 1.90

HC feeds C1 3.60 C2 18.26 C3 9.77 H C4 6.34 3.53 C5 1.70 5.56 CX Cl Aromatics 12.47 1.65 Mass Mass Cl2 output input 6.83 28.99 73.45 H2SO4 O 3.04 2.51 NH3 2.58 NaOH N 0.03 3.30

C 21.27

Organic Chemicals

Mass waste 44.46 (C 30.73, H 6.37, O 0.80, N 2.09, Cl 5.18, S 0.99, Na 1.90)

Organic Chemicals

n(C2H4) 9.43 n(C3H6) 3.78 n(C8H8) 2.25 n(C2ClH3) Mass output 4.16 28.99 C6H12O2N 0.26 C2H6O2 2.36 C5H12O 6.75

Mass waste 44.46 (C 30.73, H 6.37, O 0.80, N 2.09, Cl 5.18, S 0.99, Na 1.90)

Fig. 6 Mass balance by elements and compounds of the production of organic chemicals in 1991 (MMT). (Source: Ayres et al. 2011)

RE was used to produce phosphors: 6,135 t for red, 2,065 t for green, and 50 t for blue color. Phosphors are largely used for lighting (84%), followed by LCDs (12%) and plasma displays (4%).

3.2.2 Catalysts In 2010, 22,920 t of RE was used as catalyst. About 70% was used in fluid catalytic cracking (FCC) and 30% in autocatalyst converters. Cerium and neodymium are also used in non-cracking catalyst processes such as ammonia synthesis, hydrogenation, dehydrogenation, polymerization, isomerization, and oxidation and in automobile emissions control; however, the amounts used for this purposes are not published. Fluid Catalytic Cracking (FCC) FCC is mainly used in petroleum refining to break down long complex organic molecules as kerosene and heavy hydrocarbons into simpler and lighter molecules as gasoline and liquefied petroleum gas. The most widely used catalysts are synthetic zeolites (zeolite Y and ZMS-5 zeolites) which contain lanthanum and cerium that improves the stability at high temperature and increases catalyst activity and gasoline selectivity (Yang et al. 2003). Commercial catalysts are composed by 85% amorphous silica-alumina cracking catalyst and 15% of zeolites with a varying content of 0.2–3% of RE (Estevao et al. 2005; Xiaoning et al. 2007; Schiller 2011). In 2010, the annual production of feedstock from FCC was 1,668 million liters. FCC containing primarily Y zeolites account for more than 95% of total consumption (Davis and Inoguchi 2009). Assuming that production of RE for FCC requires 15,940 t of RE, for each liter of feedstock, an average of 9 g of RE is required, that is, 0.2% of RE content. Autocatalyst Converters RE metals are also key for autocatalyst converters to reduce the emission of carbon monoxide (CO), hydrocarbons (HC), and nitrogen oxides (NOx). They are added to the wash-coating to improve the thermostability of alumina and ensure the activation

Materials Balance Models

1531

of catalyst under high temperature. In 2010, the manufacturing of 78 million units of cars required a total of 6,980 t of RE, which gives an estimate of 90 g of RE per vehicle, one-third of the amount reported in 2003 (Xiaodong and Duan 2004). The composition of RE in converters is 90% cerium, 5% lanthanum, 3% neodymium, and 2% praseodymium. Thus, in 2010, 6,280 t of cerium, 350 t of lanthanum, 210 t of neodymium, and 140 t of praseodymium were used in internal combustion vehicles.

3.2.3 Electrical Storage in NiMH Batteries A nickel-metal hydride (NiMH) battery is a type of rechargeable battery composed by cathode, anode, electrolyte, and separator, all assembled in a steel case. Xu estimated the following content of metals: 50% nickel, 33% RE, 10% cobalt, 2% aluminum, and 6% manganese (Xu and Peng 2009). RE metals are mainly contained in the anode of NiMH batteries which are described as AB 5 where A stands for lanthanide metal and B for nickel. In practice, lanthanum is substituted by lanthanum-rich mischmetal containing 50% lanthanum, 33% cerium, 3% neodymium, 10% praseodymium, and 3% samarium (Morgan 2010). In 2010, 12,670 t of RE was used in NiMH battery alloys. For HEV which represents 65% of the end use of NiMH batteries, the total amount of RE equals 8,060 t. According to Pillot, the remaining 4,610 t was used for retail (toys and household tools), cordless phones, and other electric and electronic devices (Pillot 2011). 3.2.4 Alloying Elements Almost all the published estimation of the amount of RE metals for metallurgy agrees on giving an estimate of 32,025 t (Kingsnorth 2009; Chegwidden and Kingsnorth 2010). From this production, 75% are used in magnets and 25% in alloys with iron and aluminum. It is assumed that the usage of RE in magnets corresponds to the composition in neodymium-iron-boron (NIB) magnets whose average amount is 30% RE, 69% iron, and 1% boron (Morgan 2010). For 2010, we estimated a total of 24,060 t used in NIB magnets. Magnets are used in wind turbines, hybrid vehicles, magnetic resonance imaging (MRI), and electric and electronic devices. Wind Turbines In 2010, the new wind turbine installation was 36 GW, and only about 14% of the total new installation used NIB magnets (Schüler et al. 2011). Each MW of wind turbine installed requires 860 kg of NIB magnets. Based on the composition given by Morgan, 910 t of neodymium, 310 t of praseodymium, 70 t of dysprosium, and 10 t of terbium were used for NIB magnets in wind turbines. Electric Vehicles The number of new hybrid cars registered reached 533,000 units in 2010. Assuming that each electric vehicle requires 1 electric motor per wheel and that the average amount of neodymium per electric vehicle is 6.3 kg, the total amount of RE is 4,800 t (Talens Peiró et al. 2013). The amount of each RE metal is 3,358 t of neodymium, 1,148 t of praseodymium, 264 t of dysprosium, and 34 t of terbium.

1532

G. Villalba Méndez and L. Talens Peiró

Magnetic Resonance Imaging (MRI) In 2010, 2,500 new MRI units each of them using an average of 860 kg of NIB magnets were produced (Cosmus and Parizh 2011). The amount of RE required by them was 450 t of neodymium, 5 t of terbium, 155 t of praseodymium, and 35 t of dysprosium. Gadolinium is a minor RE metal used in the magnet sector as an MRI contrast agent and as magnet component in research for magnetic cooling. As an MRI agent, it improves the visibility of internal body structures by altering the relaxation times of tissues and body cavities where it is present. Gadolinium is used in doses of about 0.01–0.03 g per kg of body mass (Niendorf et al. 1991). For the 80 million MRI exams performed in 2010, 90 t of gadolinium was used (Cosmus and Parizh 2011). Gadolinium is also used in magnetic cooling research as powder for creating magnetic refrigeration. In 2010, about 390 t was used for this purpose. Minor Alloys RE mischmetal is also used as minor alloys for controlling inclusions and improving the performance for steel and iron. For instance, cerium combined with sulfide forms particles more rounded that are less likely to generate cracking. RE mischmetal is used in zinc galvanizing applications as for zinc-aluminum alloy named Galfan (Zn-5Al-MM) which is often used as the coatings for steel, to enhance the product life for certain applications. In 2010, 7,965 t of RE was used as mischmetal in iron and aluminum alloys.

3.2.5 Additives Additives are substances added to preserve the quality and appearance of coatings and for coloring. They are widely used in the glass and ceramic industry for quality and as colorants. In 2010, the total amount of REE used as additives was 17,425 t: 37% in ceramic and 63% in glass. Glass Industry Cerium and lanthanum oxides are used in glass to overcome the decolorizing to yellow green caused by iron oxide, always present as an impurity glass. Cerium is also a good UV and IR absorbent and thus used in protective glasses and in quantities of 2–4% for glass blowing and welding goggles (Gupta and Krishnamurthy 1992). Lanthanum is used in silica glasses to give a high index of refraction and low dispersion in lenses for autofocus single-lens reflex (SLR) cameras and video cameras. Other REE used in lower amounts are neodymium, yttrium, and praseodymium. Neodymium and praseodymium are used for coloring glasses. Neodymium colors glass bright red, praseodymium colors glass green, and their combination colors blue. Yttrium is used in the form of yttrium-aluminum garnets (Y3Al5O12) to form synthetic crystals that are widely used as an active laser medium in solid-state lasers. YAG lasers use neodymium for its optimal absorption and emitting wavelength to be used in various medical applications, drilling, welding, and material processing. Other metals used as additives are erbium,

Materials Balance Models

1533

ytterbium, and holmium which are used in luminescent solar concentrators and light sources for fiber optics and in laser materials. Ceramic Industry In 2010, 6,865 t of RE was used by the ceramic industry. The RE used were yttrium (3,495 t), lanthanum (1,190 t), cerium (980 t), neodymium (800 t), and praseodymium (400 t). Yttrium is used combined with silica for turbine blade applications. Cerium is used as phase stabilizer in zirconia-based products and in various ceramics including dental compositions. Yttrium and cerium are used in partially stabilized zirconia (PSZ) and tetragonal zirconia polycrystals (TZP), both high-performance ceramics with excellent toughness and strength properties at low and intermediate temperatures. End products of these ceramics containing yttrium are components for adiabatic diesel engines, cutting tools, wire drawing dies, and furnace elements for use up to 2,000  C in oxidizing atmosphere. Lanthanum is used in lead-zirconatetitanate (PLZT), a transparent ferroelectric ceramic material. Neodymium is used in ceramic glazes to produce blue to lavender colors. Praseodymium incorporated in zirconium silicate lattice is used for the production of high temperature resistance lemon yellow pigments for the ceramic industry.

3.2.6 Abrasive RE oxides are excellent abrasives for glass polishing in the manufacture of LCD, optical glass, mirrors, photomasks, plate glass, lenses, and cut glass. RE oxide powders provide a high mechanical abrasion react with the surface of glasses plus a high-quality finishing (Xu and Peng 2009). There are various grades of RE oxide polishing powder. They can be fully composed by cerium oxide, or with a content of 45–75% cerium with the remaining of other RE oxides. The average composition of polishing powders is 32% lanthanum, 65% cerium, and 4% of praseodymium (Morgan 2010). In 2010, 13,750 t of REE was used as polishing powder in the glass industry, about 40% of which was consumed in LCD industry. 3.2.7 Results of MFA of Rare Earths With each of rare earth metals identified and quantified in intermediate products, we can also estimate their content in end products. This quantification helps do some estimation about their recovery. Not all of these metals can be recovered in practice; however, if we know what the intermediate and end products are used for, a theoretical potential recovery can be calculated for non-dissipative uses. For example, let us trace the 24,060 t of RE in magnets in 2010, which lie under the function “alloying elements” in Fig. 7. Magnets are used in wind turbines, electric vehicle batteries, magnetic resonance imaging (MRI), electronic products, and magnetic cooling applications. If we know the amount of each metal present in each of these end products, we could calculate what could, in theory, be recovered at the end of life. Following our example, based on several references and estimations, we calculated that 1,300 t of RE was embodied in magnets of wind turbines, of which 910 t was of neodymium. Using the same approach, we found that 3,358 t of neodymium was used in magnets in electric vehicles, 450 t in MRI, and 11,980 t

Fig. 7 Rare earth metals contained in intermediate and end products. (Source: Talens Peiró et al. 2012)

1534 G. Villalba Méndez and L. Talens Peiró

Materials Balance Models

1535

in electric and electronic devices, totaling 16,700 t of neodymium that could in theory be recovered at some point in the future. This is a substantial amount if we compare with neodymium production for that year which was 21,615 t. According to Graedel, the actual end-of-life recycling rate for neodymium is less that 1% (Graedel 2011). If we look at phosphor applications, in 2010 a total of 8,250 t of REM was consumed. These ended up in lighting applications, liquid crystal display screens (LCD), and plasma panels. If we trace europium which is the only metal known that can emit blue light, 24 t was used in lighting, 20 in LCD, and 6 in plasma panels, adding to a total of 50 t of europium used for blue phosphors. Graedel estimates that less than 1% of europium is recycled at the end of life of these products. As the demand ofrare earth metals increases in the future, it will become necessary to increase the recovery of these metals at the end of life of the products that contain them and improve present production from base metals. MFA helps identify potential sources.

4

The Laws of Thermodynamics

Material flow analysis is useful in quantifying resource consumption, wastes, and process losses. However, it becomes even more useful when we combine it with the first and second laws of thermodynamics. Energy, just like matter, cannot be created nor destroyed; energy is conserved in every action or transaction. That is the first law of thermodynamics, perhaps the most fundamental of all physical laws. But energy can be degraded and be transformed to “less useful” types of energy such as low-grade heat. This fact is a consequence of the second law of thermodynamics, sometimes known as the entropy law that states that global entropy increases in every irreversible process. Exergy is measure of the potential work that can be performed by a system (Szargut et al. 1988). In other words, as a system degrades, its entropy increases and its exergy, or potential to do work, decreases. Exergy is not conserved: the exergy component is “used up” as it does work. Exergy is also a thermodynamic quantity that reflects the “distance” from thermodynamic equilibrium of a “target” material, or subsystem. It is therefore only definable in terms of a reference state, such as the atmosphere, the ocean, or the average earth’s crust, depending on which of them the subsystem will eventually rejoin and become indistinguishable from. Since the quantity of exergy contained in a subsystem is a measure of potential work, it is measurable in the same units as energy and work (joules, kWh, Btu, etc.). In the case of fuels, the exergy content is almost exactly the same as the enthalpy or heat content. For foods, the exergy content is essentially the same as the calorie content. For metals, the exergy content is the amount of heat that would be generated if the metal were to be completely oxidized. Exergy has been calculated and tabulated in reference books such as Szargut et al. (1988).

1536

G. Villalba Méndez and L. Talens Peiró

Calculating all flows in exergetic terms allows us to include both material and energy in our balance model. This is especially useful in order to calculate process efficiency and resource productivity. For more literature on exergy and material flow analysis, see chapter “Materials Balance Models” in Ayres and Villalba Méndez (2011).

5

Conclusions

Material balance is based on the mass conservation principle which states that the sum of the weight of all inputs must be exactly equal to the sum of all outputs. Such simple postulate provides significant information when used for evaluating systems. First, when inputs and outputs are known, amount of wastes can be calculated, and the composition can be identified when chemical reactions. Second, it helps estimate the conversion or yield of processes which serves as a measure of the efficiency of the process in mass terms. There are many possible applications depending on one objective. For example, material balances across a geographical level aid to monitor the physical flows of materials across regional boundaries. At global level, MFA can be used to identify the end uses of materials which serve to estimate potential recovery. Decision makers, engineers, and researchers need appropriate information tools to evaluate resource intensity, processing technologies, and resource efficiency. Using material flow analysis to evaluate economic processes helps identify opportunities strategies. For reducing waste and increasing recovery, and recycling. The combination of material balance and exergy analysis is the next logical step to accounting wastes and emissions since exergy gives a quantitative and qualitative measure of material potential usefulness.

References Ayres RU, Ayres LW (1998) Accounting for resources 1: economy-wide applications of massbalance principles to materials and waste. Edward Elgar, Cheltenham/Lyme Ayres RU, Ayres LW (1999) Accounting for resources 2: the life cycle of materials. Edward Elgar, Cheltenham/Lyme Ayres RU, Villalba Méndez G (2011) Materials balance models book section materials balance models. In: Batabyal AA, Nijkamp P (eds) Research tools in natural resource and environmental economics. World Scientific, Singapore, pp 403–422 Ayres RU, Talens Peiró L, Villalba Méndez G (2011) Exergy efficiency in industry: where do we stand? Environ Sci Technol 45(24):10634–10641 Bommaraju TV, Lüke B, O’Brien TF, Blackburn MC (2000) Chlorine. Book section chlorine, 5th edn. Kirk-Othmer encyclopedia of chemical technology. Wiley, New York Bringezu S, Moriguchi Y (2002) Material flow analysis. Book section material flow analysis. In: Ayres RU, Ayres LW (eds) A handbook of industrial ecology. Edward Elgar, Cheltenham/Lyme, pp 79–90 Chegwidden J, Kingsnorth D (2010) Rare earths – a golden future or overhyped? In: 20th industrial minerals international congress and exhibition, Miami

Materials Balance Models

1537

Cosmus T, Parizh M (2011) Advances in whole-body MRI magnets. IEEE Trans Appl Supercond 21(3):2104–2109 Davis S, Inoguchi Y (2009) Chemical economics handbook. Stanford Research Institute, Stanford Estevao LR, Le Bras M, Delobel R, Nascimento RSV (2005) Spent refinery catalyst as a synergistic agent in intumescent formulations: influence of the catalyst’s particle size and constituents. Polym Degrad Stab 88(3):444–455 Graedel TE (2011) On the future availability of the energy metals. Ann Rev Mater Res 41(1): 323–335 Gupta CK, Krishnamurthy N (1992) Extractive metallurgy of rare earth. Int Mater Rev 37(5): 197–248 Kingsnorth D (2009) The rare earths market: can supply meet demand in 2014? Prospectors and developers association of Canada, Toronto Mila i Canals L (2003) Contributions to LCA methodology for agricultural systems. Institut de Ciencia i Tecnologia Ambientals ICTA, Universitat Autonoma de Barcelona, Bellaterra, p 250 Morgan JP (2010) Rare earths. We touch them everyday. Australia Corporate Access Days, New York Mueller J, Griese H, Hageluken M, Middendorf A, Reichl H (2003) X-free mobile electronicsstrategy for sustainable development. In: IEEE international symposium on electronics and the environment, Vienna, pp 13–18 Niendorf HP, Haustein J, Cornelius I, Alhassan A, Clauß W (1991) Safety of gadolinium-DTPA: extended clinical experience. Magn Reson Med 22(2):222–228 Pillot C (2011) HEV, P-HEV and EC market 2010–2020. Impact on the battery business. In: 4th international congress on automotive battery technology, Wiesbaden Schiller R (2011) Optimizing FCC operations in a High Rare Earth Cost Market: Part I. Refin Oper 2(15):1–2. Accessed on 20 Jan 2013 at http://refineryoperations.com/downloads/refinery-opera tions_2-15_2011-08-03.pdf Schüler D, Buchert M, Liu R, Dittrich S, Merz C, Merz C (2011) Study on rare earths and their recycling. Öko-Institet e.V, Darmstadt, p 162 Suresh B, Fujita K (2007) Ammonia. Stanford Research Institute Szargut J, Morris DR, Steward FR (1988) Exergy analysis of thermal, chemical, and metallurgical processes. Hemisphere Publishing Corporation, New York Talens Peiró L, Villalba Méndez G, Ayres RU (2013) Material flow analysis of scarce metals: sources, functions, end-uses and aspects for future supply. Environ Sci Technol 47:2939. Accepted for publication Van der Voet E (2002) Substance flow analysis (SFA) methodology. Book section substance flow analysis (SFA) methodology. In: Ayres RU, Ayres LW (eds) A handbook of industrial ecology. Edward Elgar, Cheltenham/Lyme, pp 91–101 Xiaodong W, Duan W (2004) Development of auto exhaust catalysts and associated application of rare earth in China. J Rare Earths 22(6):837–843 Xiaoning W, Zhen Z, Chunming X, Aijun D, Li Z, Guiyuan J (2007) Effects of light rare earth on acidity and catalytic performance of HZSM-5 zeolite for catalytic clacking of butane to light olefins. J Rare Earths 25(3):321–328 Xu T, Peng H (2009) Formation cause, composition analysis and comprehensive utilization of rare earth solid wastes. J Rare Earths 27(6):1096–1102 Yang H, Wang H, Yu H, Xi J, Cui R, Chen G (2003) Status of photovoltaic industry in China. Energy Policy 31(8):703–707

Climate Change and Regional Impacts Daria A. Karetnikov and Matthias Ruth

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Expected Impacts Based on Level of Urbanization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Density-Dependent Impacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Agriculture and Forests Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Natural Landscapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Regional Differences in Risk and Mitigation Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 North America . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Europe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Asia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Latin America . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Africa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Australia and New Zealand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Interconnectivity and Reach of Impacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Global Social Justice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1540 1541 1541 1543 1544 1545 1547 1548 1550 1551 1553 1554 1555 1556 1558 1559 1559

Abstract

The expected global impacts of climate change can be attributed to a set of common stressors. The magnitude of specific impacts, however, depends on the extent to which regional resources – from ecosystems to human-made infrastructures – are at risk and the abilities of regions to mitigate that risk. This chapter begins with an overview of some of the impacts expected from climate change, D. A. Karetnikov (*) University of Maryland, College Park, MD, USA e-mail: [email protected] M. Ruth (*) University of Alberta, Edmonton, AB, Canada e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_57

1539

1540

D. A. Karetnikov and M. Ruth

stratified by the density of populations and economic activities. Then we review differences in risk and mitigation capacities across major regions. The inherent interconnection of environmental, economic, and social dimensions of climate impacts underscores the need to assess climate change impacts in ways that address these dimensions. Keywords

Climate change impact · Western European country · Road density · Southern European country · African Nation

1

Introduction

The arguments made by researchers, policymakers, and activists on the need to curb greenhouse gas emissions often revolve around an implicit or explicit understanding of the expected costs and benefits of probable impacts. The result of any structured comparison between costs and benefits, in turn, depends on the impacts that are considered, how these impacts are defined and distributed across the economy, society, and its environment, how they are measured and weighted, and how they are aggregated to generate a net cost or benefit that then guides action. On the one hand, many of the expected impacts from climate change can be usefully discussed together – temperature increases, changes in precipitation, and an increase in the number of more severe weather events that threaten to disrupt urban and suburban centers; infrastructure networks like roads, energy transmission lines, and shipping routes; and evolved ecosystem relationships important to agriculture and natural landscapes in many densely populated regions. On the other hand, the situational variables, where an impact occurs and whom it affects and to what extent, determine the magnitude of the damages and must be understood in the local context of the capacity to prepare for an event or deal with its aftermath. Regional variability cannot be subtracted out from global assessments of climate change impacts, and it introduces enormous complexities into the global decisionmaking process of designing and agreeing on a united response. Each region has its own economic and trade portfolios at risk and its own historical and political value systems to deal with that risk. This chapter presents an overview of some of the impacts expected from climate change that are common across the world based on the level of urbanization – from densely populated urban areas to less-populated regions. Then we review differences in risk and mitigation capacities across major regions, how economic and political interconnectivity links regions closer, and the differences in the means that are necessary to mitigate or adapt to the risk and to deal with the impact itself. Much of our analysis relies on existing geophysical models and their estimates. Many such climate models exist that relate hundreds of complex global parameters together to chart out the circulation patterns between the Earth’s atmosphere and its oceanic system and to derive potential responses to the current emission trends. Uncertainty is an inherent part of the modeling process. Because of the

Climate Change and Regional Impacts

1541

differences in how the models treat the various parameters and their relationships, to actually estimate impacts of climate change, the Intergovernmental Panel on Climate Change (IPCC) uses a number of models’ projections to arrive at its results. Their details are described by the IPCC itself (Intergovernmental Panel on Climate Change 2014). Beyond the geophysical models, integrated assessment models (like the RICE and the DICE models developed at Yale University, the Stanford-based MERGE model, the FUND model developed by the Dutch economist Richard Tol, or the PAGE2002 model used in the influential Stern review) provide the framework to picture how climatic changes may overlap with societal dimensions (for detailed overviews of such models, see Stanton et al. 2009; Ortiz and Markandya 2009). Often the models are used to calculate the net or the average impact of climate change – or to perform a benefit-cost analysis, arriving at conclusions that mask regional vulnerabilities. Among the poignant critiques of such benefit-cost analyses are that the models in the background tend to ignore the existence of low probability but catastrophic outcomes and that the models estimate future damage probability functions relying on normal distribution rather than the so-called fat-tail Pareto distribution that may more accurately represent the uncertainty involved given that the current concentrations of greenhouse gases exceeds any past experience. Simply substituting the type of probability function used alters the estimates for probabilities of temperature changes experienced under different scenarios. For instance, at a concentration of greenhouse gas emissions double the 2011 levels, under the assumption of a normal probability distribution, the probability that global average temperatures will stabilize at 10  C higher than the 2011 temperature is more than seven times lower than when using a “fat-tail” distribution (Weitzman 2011). Clearly, the size of the temperature increase determines the severity of damages. Other problems with using the models persist, such as finding the proper rate of discounting to estimate long-range impacts. In other words, seemingly small decisions about which parameters to include or exclude in the models can have a large impact on the outcomes presented – and on the policy debate. Rather than present a range of outcomes derived from such models, we construct a baseline look at what is at stake. Our goal is to disaggregate the potential regional impacts and to discuss them in two interrelated ways – one that uses the theoretical and empirical underpinnings to gauge potential impacts based on the level of urbanization and another that connects the IPCC estimates to regional socioeconomic data to describe the range and location of potential impacts.

2

Expected Impacts Based on Level of Urbanization

2.1

Density-Dependent Impacts

Regional differences in population densities bring with them differences in the diversity and extent of economic activities, differences in the need for infrastructure systems and services, and differences in stresses on the local environment. The latter

1542

D. A. Karetnikov and M. Ruth

range from the need to convert land to make room for people and their economic activities; to changes in water availability, air quality, and species diversity; and to changes in the local and regional climate. Global economic and environmental variables have become essential drivers behind these local and regional changes, often exacerbating already existing stressors on social, economic, and environmental conditions. With most of the world’s largest cities located close to oceans, lakes, or rivers, flooding has been and will continue to be a major concern to urban populations. While the average rate of sea level rise from 1961 to 2003 was approximately 1.8 mm per year, that rate has accelerated during the last decade of that period to 3.1 mm (Intergovernmental Panel on Climate Change 2014). As a result, low-lying areas become more readily inundated during high-tide events, storm surges are magnified, coastal ecosystems and their abilities to protect inland areas are rapidly lost, and aquifers and agricultural soils near coasts lose productivity because of salt water intrusion. Recent estimates suggest that by the year 2080 sea level rise and its associated impacts will affect five times more people than it did in 1990 (Nicholls et al. 1999) because both climate change and coastal population sizes will accelerate. Some settlements, particularly many small island communities in the South Pacific, are likely to be completely submerged. Increases in the frequency of heavy precipitation events – most notably rainstorms but also snowfall – have been observed globally throughout the twentieth century, and these trends are expected to continue throughout the twenty-first century. As a result, in some cases, the lakes and rivers on which cities are located will flood; local storm water and flood control systems will be overwhelmed; residential, commercial, and public infrastructures will be inundated and in some cases destroyed (United Nations Human Settlements Programme 2011); ecosystems will be impacted by increased loading of wastes – from untreated sewage to debris to runoff of fertilizers, pesticides, and other potentially harmful substances – and thus experience a loss of their water absorption capacities; water quality will be impaired; land will erode, leaving uphill populations with a need to cope with destruction of their living space and downhill populations with the challenges of dealing with the influx of materials – from soils to debris and other wastes; human physical and mental health will be compromised and lives will be lost; and economic productivity will be undermined, and with it the ability will be reduced to prepare for future flooding impacts because of the need to divert funds for emergency measures and rebuilding efforts and because the loss of economic activity may bring with it a loss of regional, national, and international competitiveness. In some instances, heavy precipitation and rising sea levels, particularly during tropical cyclones, will combine to affect millions of urban dwellers. Extreme heat events are also predicted to become more frequent and intense. Impacts of heat on urban populations will vary considerably, depending on their acclimatization (Ruth et al. 2006) and their ability to invest in cooling – requiring access to air and space conditioning but also changes in building materials and designs, as well as development of green spaces in cities. Some of these investments, especially where they will require increased energy demand, are bound to contribute

Climate Change and Regional Impacts

1543

to urban heat island effects (Akbari 2005), which in turn may set off a spiral of higher energy consumption, further increases in urban temperatures, changes in regional precipitation patterns, and declines in urban air quality, leading to further cooling needs. Particularly when heat waves coincide with droughts, exacerbation of urban heat island effects and stresses on water and energy supply and distribution ensue. And because not all sectors and households in the urban environment have equal need for or access to cooling, water, energy, disaster relief, and health services, climate impacts in urban areas will likely be not uniform and often exacerbate already existing economic and social inequalities. Urban areas, of course, are intricately linked to their hinterlands through exchange of water, energy, agricultural, and manufactured goods, services, and people – commuting or migrating, sending and receiving money, or setting trends and expectations. As a consequence, impacts of climate change on urban areas are likely to ripple through to affect larger regions and potentially the global flow of people, goods, and services, and vice versa, and impacts on rural areas will make their mark on the economic, social, and environmental performance of cities. About half of the world’s population lives within 200 km of a coastline (Small and Cohen 2004). One notable example of urban–rural interconnections concerns the provision and use of ecosystem services – from flood control to provision of building materials to supply of food and beyond. Such services are essential to revenue generation and quality of life in cities. However, already approximately 60% of ecosystem services evaluated in the Millennium Ecosystem Assessment are considered degraded or used unsustainably (Millennium Ecosystem Assessment 2008). Increasing urbanization and climate change are likely to continue undermining the provision of ecosystem services with far-reaching consequences for local, regional, and global sustainability.

2.2

Agriculture and Forests Impact

Opportunities to study and understand potential climate impacts on agriculture and forests are provided both through natural experiments – such as the El Niño/ Southern oscillation and North Atlantic oscillation phenomena – and deliberate manipulations through the so-called FACE experiments (free-air CO2 enrichment experiments). The latter allow for inferences from temperature variations, while the former test impacts of enriched carbon environment on growth of plants and forests. One of the main findings is the difference in responsiveness between two types of respiration systems found in plants. Most herbaceous plants have a C3 metabolic pathway, including wheat, barley, oats, rice, and soybeans. Higher carbon dioxide levels enhance the growth of these plants. Corn, sugar cane, sorghum, millet, and tropical grasses are C4 plants. For these, higher temperatures (to a certain threshold) are beneficial, but they do not appear to respond to increased atmospheric CO2. Another physical response mechanism involves a long-ago developed adaptation technique in plants. In times of decreased water availability, the stomata on leafs close, reducing evapotranspiration and, therefore, the plant’s water demand.

1544

D. A. Karetnikov and M. Ruth

In fact, although some studies indicate that the overall irrigation requirements will greatly increase in the USA in the upcoming decades, stomatal closures reduce the impact by around 35%. FACE experiments with forests indicated that young trees respond very well to increased CO2, although mature forests have a much lower (and in some cases, negligible) response (Backlund 2009). Clearly the distribution of plants with differing metabolic systems within and across continents is one component in identifying regional differences in risks. An aspect of plant-level processes that adds variability to impacts of climate change on agricultural production comes from the different responses that plants show to changing ecosystem attributes, such as shifts in conditions favorable to insect populations, weeds, and disease agents. For instance, although soybean yields are projected to respond positively to higher temperatures, one experiment showed that damage from a harmful insect increased by 57% when fed on soybeans grown under higher CO2 concentrations. The main unknown effect is the result of the competition between C3 and C4 plants. Many weeds are C3 plants, so their ranges will likely spread in the upcoming decades. On the other hand, the most popular herbicide, glyphosate, has been shown to lose efficacy for plants grown with more CO2. Another costly (but unaccounted effect) is the projected higher demand for nutrient inputs. Although water will be the limiting factor in some regions in the following two to three decades, productivity of forests and many crops (wheat, corn, soybean, less so for cotton) is expected to increase for other regions. In fact, timber productivity may increase from 20% to 60% in the next three decades. Beyond a certain temperature threshold, however, many crops will be unable to survive. If greenhouse gas emissions continue unabated, we will likely reach this point around 2050–2060. The concept of these thresholds is a bit misleading, however, since most productive and marketable yields are produced under much lower-temperature conditions. For instance, even though the official temperature threshold for corn is around 35  C, optimal yields are achieved between 18  C and 22  C (Backlund 2009).

2.3

Natural Landscapes

Grasslands, inlands, and coastal wetlands, shrub ecosystems, and other hotspots for ecological biodiversity will likely experience declines in area and probable changes in their functionality. The more fertile ecosystems with quick reproduction and decomposition rates may initially benefit from warmer temperatures and increased CO2 concentrations. Yet more severe storm events will likely counteract these benefits as greater runoff rates contribute more nutrients to the local waterways stressing aquatic ecosystems and water resources. A comprehensive review of 866 studies on movements of ecological patterns found that several worrisome changes can be attributed to warming. Particularly at risk are range-restricted species

Climate Change and Regional Impacts

1545

and mountaintop species that will see drastic contracts of their ranges, tropical coral reefs and amphibians are also affected, and many disruptions to coevolutionary forces between predators and their prey as well insects and plants have been observed. Impacts on migratory and songbirds, butterflies and dragonflies, flowers like lilac and honeysuckle, and aquatic and tropical species are already apparent (Parmesan 2006). In most cases, ecosystem resilience is being tested not just from the changes in climatic factors but also by other challenges associated with expansion of human activity, such as fragmentation of suitable habitats when forests and grasslands are converted to agricultural or urban uses, chronic overuse of fertilizers, discharge of pollutants, interference with water ways, and deliberate or incidental introduction of alien species. Beyond plant and animal biological responses to changes in levels of greenhouse gases or temperatures or surrounding moisture in the atmosphere, land areas with agricultural production and forest lands are also affected by damages from more frequent severe weather like massive flooding and extreme heat and drought. Recent examples of incidents that illustrate severe weather impacts include a series of heat waves in Europe in 2003, 2007, and 2010, immense landslides in South America and Asia in 2010 and 2011, the sweeping fires raging across vast swaths of Russia and Australia in 2010, and the powerful and devastating floods in Pakistan the same year. The impacts to urban infrastructure, agricultural fields, rural landscapes, and people’s livelihoods are clearly immense.

3

Regional Differences in Risk and Mitigation Capacity

Some impacts from climate change will likely be similar across economic sectors in many countries around the globe. One useful way to understand such impacts is by differentiating them according to regional levels of urbanization. The severity of climatic effects further depends on a region’s physical geography as well as regional and local economic and sociopolitical arrangements. For example, diversity in economic sectors, levels of investment in assets at risk from climate change impacts, and dependence on vulnerable infrastructure define economic threat. The built-in capacity of institutions to plan, respond, adjust, remain flexible, innovate, and cooperate across government offices and across international borders is inherently local and underlies the duration and magnitude of the impacts. The stressors on systems are similar across the globe: increases in temperature, changing precipitation patterns, rising sea levels, and more frequent intense storm events. The resultant effects determine shifts in species diversity and distribution, functional changes in natural and managed ecosystems, changes in water availability and pathogen transport, and disruptions to human-made infrastructure. The affected sectors are likewise similar. Agriculture, forestry, tourism, hunting and fishing, coastal real estate, insurance, as well as physical property like buildings, bridges,

1546

D. A. Karetnikov and M. Ruth

roads, railroads, and airports may see damages or disruptions as climate change intensifies. Estimating impacts requires measuring many types of activities at their location – settlement and infrastructure sensitivity, food security and agriculture, ecosystem sensitivity to disturbances, human health sensitivity, and water resource sensitivity. The extent of these losses depends, first of all, on exposure, that is, the distribution of assets prone to risk. On that front, stark regional differences emerge. The Intergovernmental Panel on Climate Change (IPCC) provides the most comprehensive review of regional impacts (Intergovernmental Panel on Climate Change 2014). For the purposes of the 2007 review, the IPCC uses its own regional designations to describe patterns of impacts estimated through several different global climate models. We match those designations with data from the World Bank and the Food and Agricultural Organization of the United Nations to sketch out the factors that form the baseline for regional differences in terms of impacts and in terms of capacity to respond. The snapshot we provide is from the decade after the international community started negotiating a united policy framework to deal with contributing factors to climate change in 1990s. We use the latest data available for the 2000–2010 period (details are described in the Appendix at the end of this chapter) and connect it to the climatic impacts the IPCC projects for the same regions. This aggregation serves to connect the socioeconomic layer to climatic projections. Yet it conceals many of the country-specific and within-country differences, as well as the ever-evolving status of socioeconomic indicators. Still, the aggregation allows for a regionally comparable look at the baseline conditions that already underlie the vastly different response capacities – both to mitigate and to adapt to the projected changes. Understanding the starting point onto which such future changes are projected is essential to understanding the entire scope of potential impacts. [All monetary figures used are in US dollars]. The map below shows the regional designations used here (Fig. 1).

Fig. 1 IPCC Designations for climate impacts

Climate Change and Regional Impacts

3.1

1547

North America

The continent of North America as a whole is projected to see more weather-related storms of all types and increases in associated damages. But in contrast to the western, southeastern, and northeastern regions, which already have incurred costs related to the changing climate, the northern portions of the continent may initially experience benefits as milder winters bring longer growing seasons. A study of potential economic impacts from climate change in the United States based on eight regions revealed the depth of those differences (Ruth et al. 2007). Looking at the snapshot of the socioeconomic situation in 2000s shows that around a third of the US population resided along all the coasts. But the population was not distributed evenly. A third of the country’s private property was located on the northeastern coast, for example, which is also home to four of the largest cities in the United States. The eastern side of the continent is especially prone to impacts from rising sea level because a natural process of subsidence is already pulling the continental board downward. The northeast portion of the country has already seen an increase in severe weather, with the largest increase in very severe events, complemented by a warming of 2.2  C. An insurance company’s analysis found around US$4 trillion in assets vulnerable to hurricanes in that area alone. What are expected to become more common, category 4 hurricanes touching down in heavily populated metropolitan regions could cost upward of $50 billion (Ruth et al. 2007). Major American cities have points below sea level, such as New Orleans, Miami, Jacksonville, Houston, Boston, New York, Washington DC, and Seattle. Five of the ten cities most exposed to a one in a 100-year flood event are here (Nicholls et al. 2008). The estimate of a flood occurring of that magnitude is based on historical data, and projections indicate that such events will become more and more frequent. All together, around US$19 trillion of insured property are potentially on the path of North Atlantic hurricanes. Since sea-surface temperature plays a major role in hurricane formation, scientists are exploring the possibility that climate change may indeed intensify storms. While it is still a matter of some debate, a recent study found a dramatic shift in the average annual number of tropical storms and hurricanes between 1995 and 2005. The previously steady rate of 9.4 storms jumped by over 50% to reach an average of 14.8 storms per year (Pearce 2005; Hecht 2007). States on the southern border of the continent – Texas, Alabama, Georgia, Florida, and North Carolina – have each seen over 20 natural disasters causing damages over $1 billion in the 25-year period between 1980 and 2005 (Lott and Ross 2006). In the United States in 2000s, hurricanes have caused an average damages of $5 billion per year (National Oceanic and Atmospheric Administration 2010). Nearly 90% of the population of the continent lives in urban areas, implying that impacts that affect urban infrastructure may be more immediately relevant. The continent enjoys a vast transportation network with around 730 million kilometers of railroad and nearly eight million kilometers of roads, most of them in the United States. The southeastern, southern, and western regions of the continent will see severe challenges related to water availability. In addition to a complex political system that

1548

D. A. Karetnikov and M. Ruth

guides current water distribution in much of the western portions of the United States, climate change will bring a much drier climate there. Not only water resources for human consumption will be stressed but also water needed to sustain the natural ecosystems and fauna. The center of the continent relies much more on agriculture. Many millions hectares of agricultural crops that are used not only domestically but are exported throughout the world are grown in the United States. But this means that changes in weather patterns, especially more frequent extreme weather events, cause much damage. Flooding in the Midwest – which has become more frequent – causes billions of dollars at a time. For example, floods in the summer of 2008 caused $15 billion damages in the region. Crop damages totaled over $2 billion that year. On average, floods in the USA caused $5 billion annually in 2000s. Over the last 10 years, 15% of flood damages were to crops, one of the most important agricultural commodities. The United States Global Climate Change Impacts team projects continued increases in precipitation and flooding in the center of the continent. Agriculture will see changes across the country. Overall, heat stress will likely alter the relative composition of pests to plants to nutrients used. Increased insect outbreaks are possible in the northwest reaches of the continent to the southwest (Karl et al. 2009). Temporarily, climate change may extend growing season and be beneficial toward the industry, such as in the Pacific Northwest. But other concerns may undermine this trend. For instance, many invasive species benefit from warmer climates, and they cost $120 billion a year to control in 2000s (Pimentel et al. 2005). Although not many people work in agriculture, the sector is very important economically and geographically. Across the entire continent, agriculture added over $50 billion to the economy annually in 2000s. In the United States, agricultural uses take up around half of the country’s land area. Recently, studies show adverse impacts from climate change on specific agricultural industries across the nation – for example, the dairy cows suffering lower productivity as temperatures rise or grape quality diminishing as springtime advances. Milk production and wineries are small but growing profitable agricultural activities, especially in the United States. As research continues and focuses on more specific industries and locations, more impacts are revealed.

3.2

Europe

The European continent will see varying changes. For example, every scenario run through the models used for the latest IPCC report indicate that the northern areas of Europe will see greater warming during the winter months, while southern regions will need to prepare for hotter summers. Maximum temperatures experienced over an average year are projected to rise more steeply in central and southern European countries. The socioeconomic context in the 2000s provides the baseline conditions on which the impending changes will occur. A quarter of Europe’s 600 million people resided in one of the southern European countries and the Mediterranean region – Italy, Portugal, Slovenia, Serbia, Albania, Greece, or others. Another

Climate Change and Regional Impacts

1549

quarter lived in eastern Europe. On average, projections show increases in precipitation in the north, but decreases in the south – albeit intensity of the events will continue to strengthen across the entire continent. Incidence of heat waves and duration of droughts will increase in central and southern European countries. Countries in western Europe and alpine regions will also see more dry periods and hot days. Melting of permafrost, less snowfall, and loss of glaciers in mountainous regions are expected, as well as increased flooding along the coastlines. In general, changes in precipitation will affect water resources, with consequences for both the non-managed and managed landscapes. Since less wintertime precipitation will end up as snowpack, flows in major European rivers during winters will increase, while decreasing in summer months. Eastern and southern Europe may see especially dramatic declines. Because of regional climatic trends, sea level rise along the European coasts may be 50% more than the expected global average. In 2009, the European Commission’s Directorate General for Maritime Affairs conducted a survey of 22 coastal European countries, finding that around US$700 billion to 1.4 trillion of assets were located within half a kilometer of a coast. Over a third of the GDP of these countries is created within 50 km of the coastline (Directorate-General for Maritime Affairs and Fisheries 2009). Infrastructure impacts reverberate through many sectors. For instance, transportation reliability may suffer. In 2000s, railroad systems in eastern and Western European countries transported 95 billion and nearly 220 billion passenger kilometers on an annual basis, respectively. The total network of roads was over two million kilometers in western Europe and around 1.5 million kilometers in the other regions. But again, distribution matters. On average per country, road density in Western European countries was nearly eight times that of Eastern European countries and four times of southern European countries. The exposure portfolios are starkly different. To every 1.16 cars that a person in western Europe owns (or 1.5 in North America), an Eastern European had a third of a vehicle. This means practical differences in the level of development of infrastructure and, therefore, its exposure to impacts. It also means that personal mobility of families may be compromised in an emergency. Because many of the trends in climate changes are ongoing, researchers have observed impacts on those systems already. The capacity to deal with the impacts relates to how exposed countries’ portfolios are to particular risks. For instance, managed landscapes like croplands and fisheries will be at risk from a combination of factors, but the resultant damage may be less severe as affected parties adapt to changing conditions. The associated socioeconomic impact of that response comes down to several factors. To a large degree the impact will depend on the amount of water available to cope with increasingly drier conditions. In southern European countries, about 10% of agricultural lands were irrigated – in contrast to less than 5% in northern Europe. On average, southern and Eastern European countries were economically more dependent on agriculture than the other regions. Employment was more concentrated in agriculture too – with about a sixth of the population working in agriculture in southern and eastern Europe in 2000s. Changes in the climate threaten agricultural stability, potentially threatening economic livelihood of

1550

D. A. Karetnikov and M. Ruth

many people. On the other hand, on average, agricultural employment stood at 3% in Western European countries and at less than 5% in northern Europe. The southern and eastern regions appear to be more vulnerable. For example, although mild warming will have little effect on agriculture, increases of over 5  C can lead to 10% reductions in crops overall, but around 25% reductions in southern Europe (Agrawala 2007). Water availability and flow affect capacity to produce hydroelectrical power. In 2000s, countries in northern Europe derived about a third of their total electricity production from such sources. Over a quarter of electricity production in southern European countries came from hydropower. Eastern European countries annually utilized over half of their internal freshwater resources, in contrast to Western European countries using a third and southern European countries about a quarter on average. This is in comparison to less than 10% used by countries in North America, signaling some strain on the resource already across the entire continent.

3.3

Asia

The Asian region spans across Russia to the Middle East through Southern Asia to China to the Pacific coast and its many islands in the IPCC report. In 2000s, around two-thirds of the world’s population lived here. While the number of rainfalls in the region has declined, the severity of storms has gone up. Severe storm events are now more frequent and more intense, resulting in increased floods and landslides. This trend is expected to continue as average precipitation is expected to increase across this sweeping region, especially across boreal forests in Asia. But parts of the continent that are currently arid or semiarid – most notably regions in Pakistan, India, and Indonesia – will continue to experience decreases in rainfall and see an increase in the number of droughts. In a similar manner, the number of tropical cyclones in the Pacific has dropped, but the intensity and the resultant damages of cyclones that form have gone up. The duration of heat waves has prolonged and will likely continue to prolong across Asia. Annual warming of 3  C by the 2050s and of 5  C by the 2080s is projected on average. Highest warming rates have been observed in North Asia, including Russia. Once again, risk to species and entire ecosystems is intensifying, altering functional relationships and pushing their physiological boundaries. In 2009, the region was home to 4.2 billion people, with about 40% living in southern Asian countries like India, Bangladesh, Pakistan, Iran, and Afghanistan. Nearly as many people lived in eastern Asia – in China, Japan, South Korea, or North Korea. Another fifth of the population resides in southeastern Asia, in Indonesia, Thailand, Vietnam, Fiji, or the Philippines. Many of these countries are small island nations. These are especially vulnerable to sea level rise. Six out of ten coastal cities with the highest populations vulnerable to a very major flooding event are spread across Asia (Nicholls et al. 2008). One study on three megacities in Asia – Manila, Ho Chi Minh City, and Bangkok – found that by 2050 damages associated with floods

Climate Change and Regional Impacts

1551

will be more and more substantial – 2% to 6% of the region’s GDP. Much of the damage was attributable to the expected land subsidence (The World Bank 2010). Intensified water stress can reduce crop yields of essential diet staples like rice, corn, and wheat. Yields for rice crops decline 10% for every degree Celsius increase. Area of land suitable to agriculture is projected to decline in east Asia, which now has 16% of its land classified as arable. Russia and countries in central Asia will likely see expanded agricultural production, yet it is uncertain how different crop portfolios will react to projected changes. For instance, China grew more than 30 million hectares of corn and nearly as many of rice. Russia and Pakistan had 25 million hectares of land under cultivation for wheat production each. In 2000s, agricultural production took up large swaths of the land. For instance, around 44% of land in northern Asia agriculture and 60% of land in central Asia were under agricultural cultivation. Western Asian countries like Saudi Arabia, Iraq, Turkey, Israel, United Arab Emirates, Georgia, Armenia, Lebanon, and others had over a third of their land in agricultural production, on average. The agricultural sector was much more important to the region’s GDP and the population’s employment than in Europe or in North America. For example, in 2000s, the agricultural sector contributed nearly a fifth of value-added activities to the GDP in central and southern Asia, where around 40% of people were employed in agriculture. Clearly, many people will be exposed to climate change impacts to agriculture. Plus, the same impacts threaten natural ecosystems. Asia is home to some of the most biologically rich spots, like the ones in China, Japan, Russia, India, Indonesia, and Papua New Guinea. To a large extent, such biodiversity continued to prosper because considerable portions of the continent were left undeveloped. For example, the infrastructure of the region in the 2000s decade was less extensive than in Europe or in North America. Around 40% of the roads were paved in southeastern and northern Asia, each with about a million kilometers of total roads. Southern Asia had about five million kilometers of roads, many of them unpaved. The Asian continent – both with its megacities and sizeable rural populations scattered across the landscape – may be more vulnerable to certain climate change impacts because of its socioeconomic portfolio more so than the regions discussed thus far.

3.4

Latin America

The Latin American region, as delineated by the IPPC, stretches from the Caribbean and Central America in the north to the very southern tips of Chile and Argentina containing around a tenth of the world’s population (about the same percent as Europe) in 2009. The IPCC reports that most of the central belt of the South American continent has seen an increase in precipitation, although southern Chile and regions up along the western coast – southwest Argentina and southern Peru – have observed lower levels of rainfall. The Amazon has seen a 10% increase in flood frequency, and rivers in the center of the continent have had a 50% increase in their streamflows. Little data exist about the middle of the continent, making it difficult to discern any trends. But much is known about the glaciers, which are

1552

D. A. Karetnikov and M. Ruth

receding at an accelerating pace as temperature and humidity conditions change and precipitation cannot compensate for the rate of melting. The glaciers spanning the continent are projected to disappear by the mid-2020s. Not only is this a loss of an important ecological constituent, the disappearance of the beautiful skiing slopes threatens an important industry and limits recreational opportunities. Models predict warming for Latin America on average with increases in temperatures from 1  C to 7.5  C by the end of the century. The occurrence of significant storms and periods without precipitation will likely increase. Frequency and intensity of hurricanes around the Caribbean islands will also likely increase – the 2001 and 2005 seasons were two of the worst on record. The United Nation Environment Programme estimates that over 11 million individuals in Latin America were affected by natural disasters in 2001 and in 2005. Around 530 million people lived within 100 km of the coast in 2005 (United Nations Environmental Programme 2008). In the central portion of South America, agricultural production has seen a benefit from greater precipitation – soybean yields increased up to 38% and corn to 18%. But natural land cover is retreating as tropic deforestation continues, fueled by the booming prices of agricultural crops like soybeans and corn (although notably by 2012, the rate of deforestation fell). In 2000s, agriculture took up nearly a third of land area in the Caribbean and South American countries. It also contributed to roughly 10% of the GDP of those regions. Large-scale crop production was not widespread in the Caribbean, where only Cuba, Haiti, and the Dominican Republic had sizable plots of rice and corn. Expansive countries like Brazil and Argentina had around 20 million hectares of soybeans each. Climatic changes affect agricultural production and agricultural prices. Future planting decisions may need to take into account the agricultural commodity’s resistance to a host of climatic variables. On the other hand, as mentioned, many areas of the continent have seen positive effects from the climatic changes. Plus, much of the region has plentiful water resources, sparing it the impacts associated with water availability. In contrast to some other regions, in 2000s, South American and Central American countries used their internal water resources sparingly, withdrawing 2.4% and 4.4% of their freshwater annually. Still, other areas faced a different situation. Caribbean nations withdraw more than 20% on average, making them more vulnerable to upcoming changes in water availability. Plus, hydroelectric sources constitute the main source of electricity for many Latin American nations. In 2000s, a large portion of the populations live in rural areas – 40% in Caribbean and Central American countries and 25% in South American countries. The difference in infrastructure development has the same implications as discussed before. Less concentration of people makes the region less vulnerable to some of the impacts. Yet there are still incredibly dense areas like in the Caribbean nations where 60% of the population lived in the largest city. That region is also the one more likely to suffer from increased frequency in extreme weather events. There, much infrastructure is at stake since road density in the Caribbean nations is than 12 times the road density as the open South American continent. Nonetheless, generally speaking and as a whole, the continent seems to be fairly resistant to the most serious impacts, at least in the short- to medium-term timeframe.

Climate Change and Regional Impacts

3.5

1553

Africa

The African continent is home to one-seventh of the world’s population. The IPCC warns that it will likely be hit the hardest by a combination of changing climatic factors and existing development challenges. The continent is projected to see a 3–4  C increase in mean annual temperature, but the northern and southern parts will likely see disproportionate warming with up to 9  C increase in the north during the summer and up to 7  C increase in the south during the spring (September to November) by the end of this century. Northern African countries are some of the driest in the world – they received less than 200 mm of rainfall per year in average precipitation, in contrast to the arid Middle Eastern countries that receive 330 mm of rainfall per year on average. Projections for changes in precipitation in Africa are less clear, especially across the Sahel desert. Generally, however, precipitation will likely decrease in the northern African countries up to 20%. It will follow a similar pattern in the south but will increase in the east. It is uncertain which precipitation trends will emerge across the expansive Sahel region. Still, the number of extreme dry and wet years is projected to rise – an expectation consistent across the globe. Although in 2000s a tremendous shift from rural to urban lifestyles was and has been ongoing across the continent, a large proportion of the population lived in rural areas. It was the largest in eastern African nations with 68% and the lowest in northern African nations with 41%. Agriculture was a significant portion of the region’s GDP – reaching 32% in western African nations and 25% in Eastern Africa. Employment in agriculture is also high with nearly 50% of the population working in the sector in western and central African regions. Whereas about half of agricultural cropland in the North American nations is dedicated to just three crops – corn, soybeans, and wheat – much of Africa’s agriculture is done on a smaller scale translating into more diversified cropping patterns. Yet as climate changes progressed, agricultural production will change as well. One study indicated that wheat may disappear from the continent altogether. Some positive effects on agricultural crop production are possible in the eastern areas, but northern, central, western, and southern Africa showed negative trends. Libya, Sudan, Egypt, and other northern African countries are facing severe water shortages. The three countries withdraw much more freshwater resources than is available already. For the 250 million people in the northern and southern African regions whose water resources are already stretched and for whom climate change will likely mean a decrease in precipitation, the situation will worsen. Western and eastern African nations, however, may see some relief. In 2000s, the regions annually withdrew 12% and 34% of their internal freshwater resources, respectively. This could mean that enough water may be available to continue the central and eastern African regions’ reliance on hydroelectric sources to provide electricity to its populations with nearly 80% and 63% of the total production generated by these sources. In contrast, northern and southern African nations got 8% and 23% of the electricity from these sources. The road network was not as extensive in Africa as it is elsewhere. This indicates that movement may be difficult in an emergency. In comparison to road density in

1554

D. A. Karetnikov and M. Ruth

North America and Australia of about 40 km of road per 100 km of land, road density ranges from around 8 km of road per 100 km of land in central Africa to 28 km in the eastern countries. Many of those kilometers are not paved, however. About 15% of the roads were paved in central Africa, 20% in western Africa, and 66% in northern Africa. Vehicle use there was the lowest out of all the world regions. This cursory overview does little justice to the extent of the social, developmental, and political challenges many regions on the continent face. Impacts from climate change, unfortunately, appear to be another challenge for many areas. The damages from the impacts may be magnified because of inadequate capacity – structural and political – to reform and to adapt.

3.6

Australia and New Zealand

Australia, New Zealand, Tasmania, Fiji, and Samoa are projected to experience significant warming by the end of this century. Temperatures in the central portions may increase up to 8  C with smaller increases closer to the coast – up to 5.4  C within 400 km of the coast. Near the coast precipitation is expected to drop by up to 80%. Southern and subtropical Australia will also see decreases, but northern and central territories may see some increases. Just like on the North American continent, southwestern regions of Australia will see many more droughts – up by 80% by 2070 as simulated by one study. New Zealand will likely also experience more frequent severe droughts. Precipitation will increase in the west and decrease in the east. In 2000s, out of the population of 27 million people (or 0.4% of the world’s population) in the region, 85% of Australians live within 50 km of the coastline, nearly 100% of Tasmanians and all residents of the small islands. Everyone in New Zealand lives within 100 km of the coast. This region of the world is especially vulnerable to the impacts from climate change related to encroachment of seawater and more frequent storms and hurricanes. Australia, New Zealand, and small island nations are projected to experience particularly harsh impacts from climate change – some of these like changes to natural ecosystems and stress on available water are already underway. Economic damages from consequences of more frequent natural disasters, such as floods, droughts, storms, or landslides, are rising. The 2002 drought cost Australia $7.6 billion. New Zealand alone suffered losses to its agricultural sector of $800 million in the multiyear droughts in the late 1990s. Floods cause an annual $85 million in damages there. The populations of these nations were concentrated fairly tightly with nearly half of them living in the largest city. Australia and New Zealand’s rural population was about 10% of the population, while Fiji’s was closer to 50% in 2009. The road density in Australia was close to that in South America or central Asian countries like Kazakhstan or Uzbekistan. The stretch of the road network was six times smaller in Australia and New Zealand than in the United States. This means that less infrastructure is on the path of destruction, yet mobility may be more limited. The fairly sizable rural populations also mean that agriculture is an important

Climate Change and Regional Impacts

1555

source of income for many – and more comparable to the numbers for Europe and North America. On average, 8% of the continent’s GDP was attributable to the agricultural sector, which employed around 5% of the population in 2000s. Agricultural productivity may be compromised as a result of temperature increases and changes in precipitation. In 2000s, Australia received about 530 mm of precipitation each year – less than three times what New Zealand got. This is comparable to the average on the southern tip of Africa. Future projections for lower precipitation across much of the continent puts in question continued expansion of agriculture (which grew nearly 7% in Australia, but contracted 12% in Fiji according to the latest available numbers). Lower water supplies may also alter how much electricity is derived from hydroelectric sources. While Australia derives just over 5% from hydro, New Zealand draws more than 55% from the source. Overall, the continent has strong baseline conditions to deal with many projected impacts in the short-term future, but its capacity may be strained as impacts intensify.

4

Interconnectivity and Reach of Impacts

The economies of the regions described above are highly interconnected globally. This interconnectivity can propagate environmental impacts in one region to affect many others – one poignant example of such was the quick shrinkage of sales of Japanese automobiles across the globe when production halted following a disastrous tsunami in the spring of 2011. The magnitude of such ripple effects depends on the extent to which a country’s economy is tied to global markets. International tourism is a particular case in point. This service industry is the economic backbone of many beach destinations, skiing and winter sport spots, and wildlife-touring locations. In 2000s, tourism attracted over 70 million people from around the world to the North American continent annually, generating over $100 billion in receipts. And travel services account for over a quarter of the countries’ average commercial services exports. Uncomfortably high temperatures, lack of snow, and degradations of ecological landscapes can diminish the sector’s contribution to national economies. Several European countries share the fears of declining tourism because of climate change, especially to the well-off snow sports industry hugging the Alps, whose annual profit is on the order of $70 billion. Only a third of the current number of resorts will remain if warming reaches 4  C, and those able to remain open may need to supplement snow through artificial means, raising their operating costs and reducing their competitiveness. In 2000s, southern economies were also dependent on international tourism, which accounted on average for 20% of the total export bill. All together Western European countries hosted 150 million tourists annually, while southern European countries welcomed 140 million. Interconnectivity is strong not only through travel but also through trade. This is especially true for the Western European countries where two-thirds of their GDP came from exports. A third of southern European countries’ GDP related to exports.

1556

D. A. Karetnikov and M. Ruth

Asian countries’ economies are also closely tied to trade relationships with other nations. In 2000s, exports were growing at very high rates in many Middle Eastern countries – like Qatar, Oman, Turkey, Saudi Arabia, Afghanistan – and countries previously tied to the Soviet Union: Turkmenistan, Tajikistan, Georgia, Armenia, and Azerbaijan. For eastern Asia, exports accounted on average for three-quarters of the countries’ GDP. International tourism was booming. All of Asia welcomed around 330 million tourists in 2000s – or nearly 30% of the total world traffic – and brought in over $265 billion. Eastern Asian countries, especially Hong Kong, China, and Korea, are particularly active in the global stock market, trading annually in stocks three times the value of their GDP, on average. Latin American countries also leaned heavily on exports to support their economies, realizing also that international tourism was a profitable business. It brought in around $60 billion for the Latin American region with over six million annual visits in 2000s. Australia and New Zealand economies rely less on trade but have a lucrative international tourism industry. Although the region attracted less than 1% of world’s international tourists with around eight million visitors, it drew 3.2% of the tourist receipts. The situation in Africa in terms of exports is much different than in the rest of the world. Although trade constitutes a large portion of most African nations’ GDP, export and import volumes for the continent were relatively low during the same time frame. For instance, the value of the Africa’s imports and exports was a tenth of Asia’s. African exports were growing in the eastern and western regions but stagnated in central and northern Africa, even though those areas were net exporters of energy. Africa drew in 60 million tourists a year – about the same number as Latin America or North America. North American countries, however, received twice as much revenue from their guests than their counterparts in Africa. The variable – and ever-changing – interconnectivity of the world is a difficult dimension to account for when estimating potential impacts of climate change. Each region has a unique suite of economic, social, and historical relations with other nations and experiences a different mix of climate impacts. The baseline conditions in 2000s described above give some perspective on where the different regions were in their economic vulnerabilities. The capacity to deal with the consequences is another dimension that complicates the issue of climate change.

5

Global Social Justice

While every continent will see impacts from climate change, some regions are much better prepared to deal with the consequences. Countries on the North American continent for instance have a very high average per capita GDP of nearly $50,000, affording them the financial means to address some of the consequences. Virtually the entire population enjoys modern sanitation facilities and has access to clean water and sufficient food. In 2000s, population growth was less than 1% per year across North American countries. Resources and time are on the regions’ side. Still, challenges remain. Aging infrastructure across the United States is a growing pain.

Climate Change and Regional Impacts

1557

In 2009, the American Society of Civil Engineers split the country’s infrastructure resources into 15 categories, giving only four categories the grade of C or C+, while the rest received D or D . Maintaining drinking water requires an additional $11 billion in annual funds. At least $100 billion is needed to update the nation’s levee system. The government spent less than 40% of the $190 billion needed to preserve the many kilometers of road. Many European countries likewise enjoy solid baseline conditions to deal with climate change impacts. But sharper differences emerge region to region. GDP per capita ranges from a low of $9,000 in current US dollars on average in Eastern European countries to a high of $78,000 in Western European countries. Economies are diversified and the populations are well educated for the most part. This gives some economic stability and capacity. Challenges abound, however. Southern European countries have an average unemployment of over 16%, for instance, and their workers attained lower educational levels than their counterparts in other parts of Europe. Families with low incomes have fewer resources to deal with the effects of storms or increased temperatures. The cost of cooling equipment and energy can be prohibitive. Migration is also high into western and northern European countries, where nearly three million people migrate annually. Even more, 3.5 million people, moved to southern European countries, particularly to Spain and Italy. An increase in people stretches institutional resources, perhaps leaving less for adaptation or mitigation purposes. Migration is a hot-button issue in Europe, and one concern there is of an increase in migration as people move to avoid the worst impacted areas. Countries in the Asian region span a full spectrum of economic and political developmental stages. As a result, government capacity to respond to costly emergencies varies. Health infrastructure, food security, and capacity of physical systems to handle growing populations and growing demand for a higher standard of living also vary. In 2000s, GDP per capita ranged from very low ranges (an average of less than $2,000 current US dollars in southern Asian countries and around $2,800 in central Asian countries to the average) to medium levels ($5,100 on average in northern Asian countries and $6,500 in south-eastern Asian countries) to relatively high incomes of $18,000 in western Asia and $26,000 in eastern Asia. Clean sanitation facilities were available to 60% of the people residing in southern Asian countries, 67% in southeastern and northern Asia, and 80% in eastern Asia. Clean water was available to over 83% of the population in every country, on average. Such present-day challenges undermine the regions’ capacity to respond to impacts from climate change. Sanitation and availability of clean water are crucial tools in preventing the spread of disease vectors, many of which may benefit from warmer temperatures. Latin American countries have varying availability of infrastructure. While they still have many unpaved roads, over 90% of the population has access to good water sources and around 80% to clean sanitation. The GDP per capita there is comparable to medium-range incomes in Asian countries. Individuals in the Caribbean Basin have around $10,000 in current US dollars, while those in Central America have less than $4,500. Individual capacity to respond to disaster depends in part on the availability of resources.

1558

D. A. Karetnikov and M. Ruth

Many of the African nations were impoverished in 2000s. The western African region has the lowest GDP per capita in the world with $836. Eastern African nations average a GDP per capita of about $1,500, nearly $400 less than southern region in Asia, which has the lowest GDP per capita there. The northern African region has the highest GDP per capita of the continent with around $4,000 – less than half of what it is in Eastern European nations and nearly 20 times less than what it is in Western European nations. This discrepancy points not only to the different level of economic development but also to the potential individual capacity to deal with a crisis. Adequate health and environmental resources are also necessary to expand individual capacity. About a third of the population lacked access to clean water in central, eastern, and Western African regions. Three-quarters lacked access to clean sanitation facilities in western Africa, and two-thirds lacked access to sanitation facilities in central and western African nations. Northern and southern regions were better off – 80% of people in the north had access to both clean water and sanitation. Half of the people have access to sanitation facilities, and 86% have access to clean water in the south. So although actual impacts from climate change will affect northern and southern African regions disproportionately, some capacity exists to adjust. Australia and New Zealand have capable and stable governmental, economic, and physical structures. The GDP per capita is nearly $20,000 on average between New Zealand and Australia, but the GDP per capita in Samoa and Fiji are about a tenth of that. Almost the entire population has access to clean water and clean sanitation facilities. There is virtually no malnutrition prevalence, whereas rates were at around a quarter of children under 5 or more in all of the regions in Asia (in southern Asia the rate is 41%, as measured by the height for age ratio) and around the same in Africa – with 45% of the children malnourished in eastern African countries. Inability to deal with existing problems highlights the level of institutional capacity. More disruptive weather events promise to test it further.

6

Conclusion

This chapter presented an overview of the range and magnitude of resources at risk from climate change and highlighted the overlapping climatic, ecological, and socioeconomic dimensions. No particular impacts are certain in their magnitude or timing. Trends, however, are clearly emerging. Still, uncertainty is part of all projections, especially complex ones like those linking global climate change with regional economic and social dynamics. The scientific portion of connecting natural processes to observed and anticipated impacts is on solid footing. The sociopolitical portion is much less certain, not least because we have few ways to communicate impacts beyond economic terms. At a minimum, economic measures provide a convenient common ground for comparing fairly simple concepts across regions, such as damage to physical coastal property, rising insurance costs, foregone prices of ruined agricultural crops, declining receipts in the tourism sector, or repair bills associated with infrastructure damage. More nuanced concepts like the fairness of associated regional distributions of those impacts, the social costs of human

Climate Change and Regional Impacts

1559

suffering related to increased incidence of disease and disaster, the environmental stress inflicted on the natural systems, the widely differing adaptive capacities, or the fairness questions that divide the developing and developed nations do not easily fit into standard models of climate impacts and responses. They can hardly be reduced to numbers, let alone to dollars. Yet such nuanced and multidimensional assessments will be needed to better understand regional impacts and guide responses.

Appendix All economic data used in this report came as country-level indicators from the World Bank’s World Development Indicators and Global Development Finance database and the Food and Agricultural Organization of the United Nations. We use the latest available statistic from 2000 to 2010 for individual countries for the World Bank data and the latest available from 2000 to 2009 for the FAO data (World Bank 2011; FAO 2009). The countries are assigned regions according to the regions in the IPCC report and subregions using the UN Geoscheme categories because the IPCC report did not provide a country-by-country breakdown of the regions. We made the following modifications. Russia and Mongolia were placed in the Northern Asia region in line with the IPCC report. This is not a region within the UN Geoscheme. Several small island nations were not listed in the UN Geoscheme and were placed to the closest IPCC region. We also grouped Melanesia, Micronesia, and Oceania islands with the Australia and New Zealand region – although because of dearth of socioeconomic data for these nations, many of them were not included in the analysis. We renamed the Middle Africa region as the central African region for more consistency. All data listed is specified as either the average per country or the total data point for countries with available data for the region. All monetary figures are in current US dollars.

References Agrawala S (2007) Climate change in the European Alps: adapting winter tourism and natural hazards management. OECD, Paris Akbari H (2005) Energy saving potentials and air quality benefits of urban heat island mitigation. Lawrence Berkeley National Laboratory, Berkeley Backlund P (2009) Effects of climate change on agriculture, land resources, water resources, and biodiversity in the United States. U.S. Climate Change Science Program and the Subcommittee on Global Change Research, Washington, DC Directorate-General for Maritime Affairs and Fisheries (2009) The economics of climate change adaptation in EU coastal areas. Summary report. Publications Office, Luxembourg Food and Agriculture Organization of the United Nations (2009) FAOSTAT, 2009. Resource document. http://faostat.fao.org/. Accessed 28 Sept 2011 Hecht J (2007) Atlantic hurricane frequency doubled last century. New Scientist, August 4. https:// www.newscientist.com/article/mg19526154-800-atlantic-hurricane-frequency-doubled-last-cen tury/. Last Accessed September 18, 2018

1560

D. A. Karetnikov and M. Ruth

IPCC (2014) Climate change 2014: synthesis report. contribution of working groups I, II and III to the fifth assessment report of the intergovernmental panel on climate change [Core Writing Team, R.K. Pachauri and L.A. Meyer (eds.)]. IPCC, Geneva, Switzerland, 151 pp Karl TR, Melillo JM, Peterson TC (2009) Global climate change impacts in the United States: a state of knowledge report from the US global change research program. Cambridge University Press, New York Lott N, Ross T (2006) Tracking and evaluating U.S. Billion dollar weather disasters, 1985 to 2005. NOAA’s National Climatic Data Center, Asheville Millennium Ecosystem Assessment (2008) Ecosystem change and human well-being: research and monitoring priorities based on the millennium ecosystem assessment. International Council for Science, Paris National Oceanic and Atmospheric Agency (2010) Economics of heavy rain & flooding data and products (costs). Resource document. http://www.economics.noaa.gov/?goal=weather&file= events/precip&view=costs. Accessed 7 July 2011 Nicholls RJ, Hoozemans FMJ, Marchand M (1999) Increasing flood risk and wetland losses due to global sea-level rise: regional and global analyses. Glob Environ Chang 9:S69–S87 Nicholls RJ, Hanson S, Herweijer C, Patmore N, Hallegatte S, Corfee-Morlot J, Ch^ateau J, Muir-Wood R (2008) Ranking port cities with high exposure and vulnerability to climate extremes. Organization for Economic Development, Paris Ortiz RA, Markandya A (2009) Integrated impact assessment models of climate change with an emphasis on damage functions: a literature review. Basque Centre for Climate Change, Bilbao Parmesan C (2006) Ecological and evolutionary responses to recent climate change. Annu Rev Ecol Evol Syst 37:637–669 Pearce F (2005) Is global warming making hurricanes stronger? New Scientist, December 3. https:// www.newscientist.com/article/mg18825281-300-is-global-warming-making-hurricanes-stronger/. Last Accessed September 18, 2018 Pimentel D, Zuniga R, Morrison D (2005) Update on the environmental and economic costs associated with alien-invasive species in the United States. Ecol Econ 52(3):273–288 Ruth M, Amato A, Kirshen P (2006) Impacts of changing temperatures on heat-related mortality in urban areas: the issues and a case study from metropolitan Boston. In: Smart growth and climate change: regional development, infrastructure and adaptation. Edward Elgar Publishing, Cheltentham, UK Ruth M, Coelho D, Karetnikov D (2007) The US economic impacts of climate change and the costs of inaction. Resource document. http://www.cier.umd.edu/climateadaptation Small C, Cohen JE (2004) Continental physiography, climate, and the global distribution of human population. Curr Anthropol 45(2):269–277 Stanton EA, Ackerman F, Kartha S (2009) Inside the integrated assessment models: four issues in climate economics. Stockholm Environment Institute, Working Paper WP-US-0801, Stockholm The World Bank (2010) Climate risks and adaptation in Asian coastal megacities. World Bank, Washington, DC United Nations Environmental Programme (2008) Report on the Latin American and Caribbean initiative for sustainable development (ILAC): five years after it was adopted, Sixteenth meeting of the forum of ministers of Latin America and the Caribbean, Santo Domingo United Nations Human Settlements Programme (2011) Global report on human settlements 2011: cities and climate change. Earthscan, London\Washington, DC Weitzman ML (2011) Fat-tailed uncertainty in the economics of catastrophic climate change. Rev Environ Econ Policy 5(2):275–292 World Bank (2011) World development indicators & global development finance. World Bank. Resource document. http://databank.worldbank.org/ddp/home.do

Urban and Regional Sustainability Emily Talen

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Sustainable Cities and Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Implementing Sustainability Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1562 1562 1564 1568 1571 1572 1572

Abstract

Sustainability has become a key concept in the quest to define a normative framework for urban and regional development. This chapter presents an overview of what is meant by sustainability first from the regional and then from the city level. Both scales have a long history in the planning domain, but the notion of a sustainable city is key to both realms and is the main focus of this chapter. While there is widespread agreement on broad parameters and principles about urban and regional sustainability, there are entrenched debates over implementation. On one level, there are debates over implementation methods, especially the degree to which partial success in implementation is better or worse than doing nothing. More fundamental debates about sustainability involve the distinction between process vs. form and the integration of city vs. nature. Keywords

Sustainability · Urbanism · Sustainable cities

E. Talen (*) Social Sciences Division, University of Chicago, Chicago, IL, USA e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_56

1561

1562

1

E. Talen

Introduction

On the subject of urban and regional sustainability, the debate is no longer whether cities and regions should be less environmentally harmful and more human-scaled, but what the specific policy and design responses should be – whether government subsidies and funding priorities, market incentives, new kinds of codes, transportation systems, or urban design schemes are achieving what is needed. Our views of the sustainable city and region have evolved from “what is it?” to “how do we get there?” in much the same way that many environmentalists decided several years ago that the debate over global warming had been settled, despite the continuing pushback from the other side. Although the concepts and terminology used in the literature on sustainable cities has become more complex (Fu and Zhang 2017; Bibri and Krogstie 2017), most urban planners would argue that, in principle and in broad terms, we know what the sustainable city and region is supposed to be and what the economic, social, health, and environmental benefits of sustainable cities and regions could potentially be. Berke and Conroy define sustainable development as “a dynamic process in which communities anticipate and accommodate the needs of current and future generations in ways that reproduce and balance local social, economic and ecological systems, and link local actions to global concerns” (Berke and Conroy 2000, p. 23). With these broad principles in mind, the focus, in the Western world and in the USA especially, is squarely on implementation. This chapter first reviews the generalized principles of sustainable cities and regions, moving from a broader review toward a more specific definition. It begins with the basic principles and then spells out what those principles might mean for the physical form and pattern of cities. It will become clear that as we move from broad principles to specific design strategies, the degree to which planners agree becomes increasingly strained. The second part of the chapter focuses on the key debates within the literature on sustainable cities. Although planners largely agree about what a sustainable city should be on a certain level, there continue to be entrenched disagreements about the best approach to getting there.

2

Principles

We can start with the meta-principle: the notion of “sustainability” from the oftenquoted Brundtland Report is “Sustainable development is development that meets the needs of the present without compromising the ability of future generations to meet their own needs” (World Commission on Environment and Development 1987, p. 43). Sustainability involves adopting a lifestyle “within the planet’s ecological means” to ensure that development does not compromise the needs of future generations and to ensure that population growth is “in harmony with the changing productive potential of the ecosystem.” Urban and regional planners have translated this to mean that cities must endure environmentally, economically, and socially, balancing what have come to be known as the three “E”s: environment, economy,

Urban and Regional Sustainability

1563

and equity (Berke 2002, p. 30; see also Campbell 1996, 2016). Basically, this means that planners should help cities develop in ways that last. These ideas were codified in the United Nations’ New Urban Agenda (UN 2017). Often sustainability is cast as a continuation of environmental planning. While planners seem to agree that sustainability requires a holistic view that includes almost equal concern for environmental, economic, and social sustainability, the environmental perspective dominates. Thus urban transport is to be energy efficient, solar power is to be promoted where possible, and water is to be used efficiently in short, cities are to be redefined as “eco-technical systems.” Cities are going “green,” and thus Routledge’s comprehensive four volumes entitled Sustainable Urban Development are principally devoted to environmental assessment. Out of this larger environmental focus, many subtopics have evolved that are of particular relevance to urban planners, like the relationship between sustainability and technology, sustainability and architecture, and even “sustainable Olympic games.” The link between sustainability and climate change in urban and regional planning has not been a strong focus and is only now receiving more attention (Berke 2016). Progress toward achieving sustainable urban development has been slow, and researchers lament the inability to better integrate development processes in a more holistic way. Sustainable development can engage a complex array of political views that range from “free market” environmentalism to ecofeminism, animal rights, and bioregionalism. The three-way conflict between environmentalism, economic development, and social justice – green cities, growing cities, and just cities, as Campbell (1996, 2016) refers to them – is present in all of these approaches, and each manifests a humans vs. nature duality to varying degrees (this duality is discussed in more detail at the end of this chapter). Proposals include “greening the market,” liberal environmentalism in the tradition of John Rawls, ecosocialist theory, or the biological rooting of culture through “reinhabitation.” In many of these applications, there remains a fundamental, lingering duality that conceptualizes an environmental crisis in human vs. nature terms. While these concepts have found their way into the rhetoric of metropolitan development reform, there is a significant question about the degree to which rhetoric is being translated into actual practice. At the neighborhood planning level, there is a significant disconnect between rhetoric and reality. Sustainability is a concept endorsed by both economic development proponents and radical ecologists, and, as Campbell (1996) points out, “any concept fully endorsed by all parties must surely be bypassing the heart of the conflict.” Cultural theorists who study the social construction of nature have argued that sustainability is simply another version of the “recovered garden” consisting of biodegradable industries, preservation of pristine wilderness, and social justice that finally achieves the “End Drama,” a “postpatriarchal, socially just ecotopia for the postmillennial world of the twenty-first century” (Merchant 1996). Sustainable cities require that economic, environmental, and social needs be balanced and interconnected. Sustainability is based on the idea that it is necessary to find the proper balance between human-made and natural environments. This constitutes a new brand of environmental thinking. Under what is sometimes

1564

E. Talen

branded “the new urban ecology,” cities are no longer viewed as necessarily detrimental but are in fact part of the solution to environmental problems (Childers et al. 2014). To make these kinds of ideals relevant, planners have translated urban impacts using concepts like “carrying capacity” to promote the idea that metropolitan development should not consume resources faster than they can be renewed, or more than natural systems can process. Similarly, the “ecological footprint” is used to measure sustainability by calculating the amount of resources consumed, postulating that sustainable development requires reduction of ecological footprints by reducing levels of human consumption that do not exceed the ability of ecosystems to provide them. This method has lately been criticized for failing to fully account for the trade-offs and benefits of compact urban form, among other things. The ecological footprint may foster human vs. nature duality because of its emphasis on establishing a causal link between cities and accelerated global ecological decline. More useful is the notion of “ecosystem services,” a method that quantifies the benefits humans receive from nature and shows how strategies like stormwater management and erosion control can offset the environmental consequences of development.

3

Sustainable Cities and Regions

The sustainable region and the sustainable city are intertwined for one obvious reason: the sustainable city is almost always discussed in terms of its regional context. In a planning sense, regionalism is about the pattern of human settlement (villages, towns and cities) set in protected open space. Anyone advocating the development of self-contained units of human settlement knows that these units must be positioned geographically but also that it is necessary to think of them in terms of an integrative framework. On this point there is little disagreement, and the idea has been operative since the regionalist perspective applied to city planning came into fruition more than 100 years ago. Looking at the world from a regional perspective began even earlier and can be traced back as far as the eighteenth century when the natural and cultural geography of Europe was particularly suited to regional differentiation. A number of definitions evolved, ranging from a focus on human economy and cultural distribution to the identification of natural boundaries. Peter Hall (2002) points out that the idea of towns of limited population surrounded by agricultural green belts is a recurrent theme, found in the writings of Ledoux, Owen, Pemberton, Buckingham, Kropotkin, and more; Saint-Simon and Fourier had cities arranged within a regional complex. Sustainability at the regional level revolves around a related idea – that issues like housing, transportation and the environment, and the political governance of each, must be treated as an interconnected, multi-jurisdictional whole. There is a need to balance human activity and nature by keeping settlement at the proper scale and level of self-sufficiency. The Regional City envisaged by contemporary regionalists

Urban and Regional Sustainability

1565

(Calthorpe and Fulton 2001) conceptualizes the “emerging region” as a revitalized central city coexisting with strengthened suburbs and preserved natural areas. In the USA, sustainable cities are a more common conception than sustainable regions, although in Europe this is not the case. In all regions, planners seem especially in agreement about what an unsustainable city looks like. There is no disputing that malls surrounded by parking lots, disconnected apartment complexes, and vast expanses of low-density detached housing – i.e., sprawl – have a higher carbon footprint than attached buildings in a walkable context. The recent book Green Metropolis by David Owen (2009) presented the most thorough documentation of this fact to date: compact neighborhoods bring with them the intrinsic environmental, social, and economic benefits of living smaller, driving less, lowering energy costs, strengthening social connection, and fostering networks of economic interdependence. The book Sustainable Urbanism by Doug Farr (2008, p. 10; see also Farr 2018) provided an even more explicit vision: “Sustainable urbanism is an integration of walkable and transit-served urbanism with high-performance buildings and high performance infrastructure.” Planners largely agree that the sustainable city is more than just green buildings and pervious pavement; it involves the design of walkable communities along with the connection to transit, food, and amenity they require. How does urban planning support sustainable ideals more specifically? Of course, the concept of a “sustainable city” includes more than the physical and infrastructure qualities of built form. In particular, institutional strategies like recycling programs, local governance, and civic participation are considered important for promoting sustainable cities. And always, green building and infrastructure technologies – efficiencies in structural design, energy use, and materials, as well as green infrastructure – are an important part of the task of city building. We can summarize the key principles that promote the sustainable city from an environmental point of view. Cities must (a) lower vehicle miles traveled (VMTs), limiting carbon emissions by looking for ways to reduce reliance on fossil fuels (cars) and increase reliance on clean transportation (e.g., bus rapid transit, light rail); (b) lower energy costs by lowering infrastructure, like highways, and utility lines, which in turn results in lower transmission loss; and (c) limit damage to natural environments by lowering impervious surfaces and runoff, compacting development, and lowering disruption of biodiversity and natural habitat. Sustainable industrial and energy systems, food production, and mitigation of heat island effects are also essential. Sustainable cities also promote “green streets” that handle stormwater within their right-of-way, contain visible, green infrastructure, and maximize street trees to improve air quality, reduce temperature, and absorb stormwater. Sustainable cities support passive solar design, sustainable stormwater practices, organic architecture, the harnessing of waste heat, and the protection of biodiversity corridors. Local jurisdictions in the USA have been attempting to incorporate sustainability in their activities, regulations, and development approval processes, promoting eco-industrial park development, bicycle ridership programs, point systems for green architecture, or the use of sustainability indicators.

1566

E. Talen

That’s just the environmental side. To endure economically, the sustainable city needs to foster diverse economic networks of interconnected relations, a view that Jane Jacobs famously advocated in The Death and Life of Great American Cities (1961). The basic idea is that a mixture of activities, people, and land uses are the key to successful places. Allan Jacobs and Donald Appleyard wrote a widely cited manifesto in which they argued that diversity and the integration of activities were necessary parts of “an urban fabric for an urban life.” The maximizing of “exchange possibilities,” both economic and social, is viewed as the key factor of urban quality of life and, now, sustainability. What counted for Jane Jacobs was the “everyday, ordinary performance in mixing people,” forming complex “pools of use” that would be capable of producing something greater than the sum of their parts (Jacobs 1961, pp. 164–165). Thus sustainable cities are tied to, or ultimately derived from, social and economic diversity. The book Building Sustainable Urban Settlements (Romaya and Rakodi 2002), for example, lists “mixed land uses” first under its set of principles for building sustainable settlements. Reduction of travel costs, and therefore energy consumption, is usually a primary motivation. The “land use–transport connection” is put forth as a counter-response to the problem of non-diversity, i.e., functional isolation. A mixture of land uses has been shown empirically to encourage nonautomobile-based modes of travel such as walking and bicycling, which in turn are seen as having a positive impact on public health. An economically sustainable city is one that fosters opportunity. In Jacobs’ words, cities, if they are diverse, “offer fertile ground for the plans of thousands of people” (Jacobs 1961, p. 14). Research has focused on resource efficiency, jobs/ housing balance, agglomeration economies, and socioeconomic interdependence, all of which urban form could impact. Agglomeration economies rise with both firm and employment densities. On the macroscale, research has focused on the need for regional transportation networks to support linkages among urban areas. Such “polycentricity” may increase transit ridership, reduce vehicle miles traveled, and lead to agglomeration economies. Non-diversity offers little hope for future expansion, either in the form of personal growth or economic development. Nor are non-diverse places able to support the full range of employment required to sustain a multifunctional human settlement. Diversity of income and education levels means that the people crucial for service employment, including local government workers (police, fire, school teachers), and those employed in the stores and restaurants that cater to a local clientele, should not have to travel from outside the community to be employed there. This brings us to the third dimension – that of social sustainability. Social sustainability (e.g., equity and justice, exposure and vulnerability, and other dimensions of community wellbeing) tends to be treated less rigorously than environmental or economic sustainability (Sampson 2013). A large body of literature explores this problem theoretically and through case studies to reveal the problem’s complexity (Agyeman 2005). As with economic sustainability, diversity is seen as a key variable. A socially diverse city – one that avoids differentiation of social groups into segregated housing

Urban and Regional Sustainability

1567

enclaves – ensures better access to resources for all social groups, providing what is known as the “geography of opportunity.” Diversity builds social capital of the bridging kind by widening networks of social interaction. Where there is less social diversity and more segregation, there is likely to be less opportunity for the creation of these wider social networks. This could be a significant disadvantage for segregated neighborhoods and could even have the effect of prolonging unemployment. While socially diverse neighborhoods continue to be seen as essential for broader community wellbeing and social equity goals, the connection to sustainability is also made – mixing incomes, races, and ethnicities are believed to form the basis of “authentic,” sustainable communities. In addition to mixed housing type, land uses that complement each other to promote the active use of neighborhood space at different times of the day will create “complex pools of use” (Jacobs 1961), a component of natural surveillance and social sustainability. Supporting this are findings that a mix of neighborhood public facilities plays a role in reducing crime. Studies of socially mixed neighborhoods consistently identify urban form as a key factor in sustaining diversity. It is possible to focus more specifically on the human-built dimensions of urban form – streets, lots, blocks, land uses, buildings, and the patterns they create. Researchers have proposed varying measures of urban form, defined as compactness, centralization, concentration, intensification, and mix of uses, and may encompass land system architecture and measures such as size, shape, density, access, connectivity, street pattern, block arrangement, and dispersal (Clifton et al. 2008). An ecologically attuned measure of urban form might include degree of concentration as well as land-use patterns that incorporate threshold and interaction effects. Compactness is a kind of meta-principle for sustainable urban form. All of the environmental principles of sustainability suggest or even require it (Bay and Lehmann 2017). Some of this is obvious. Compactness means that there will be fewer highways, greater transit feasibility, greater opportunities for combined heat and power, and lower pumping requirements for water and sewer. Compact form can lead to less energy consumption as less space requires less heating and cooling, in addition to correlating with more opportunities for combined heat and power, lower pumping requirements for water and sewer, and gray and recycled water systems. However, compact urban form can also increase heat island effects and, with it, a potentially accelerated demand for cooling services (Stone 2012). On the other hand, low-density development has been linked to higher infrastructure costs, increased automobile dependence, and air pollution. Density has been seen as an essential factor in maintaining walkable, pedestrian-based access to needed services and neighborhood-based facilities, as well as a vibrant and diverse quality of life. Walkable access to services is an essential part of the sustainability equation because people living in well-serviced locations will tend to have lower carbon emissions (Ewing and Cervero 2017). The higher the access to opportunities like jobs and services, the lower the transport costs. Related to this, sustainable urban form is defined by the degree to which it supports the needs of pedestrians and bicyclists over car drivers. This has been motivated by a concern over the effect of the built environment on physical activity and human health. Streets that are

1568

E. Talen

pedestrian-oriented are believed to have an effect not only on quality of place but on the degree to which people are willing to walk. Researchers have argued that activity levels can be increased by implementing small-scale interventions in local neighborhood environments, and a whole catalog of design strategies are now used to make streets more pedestrian-oriented. Finally, sustainable urban form is associated with what could be termed polycentric or multinucleated urbanism – the idea that urban development should be organized around nodes of varying sizes. Whereas sprawl tends to be spread across the landscape uniformly, sustainable urban form has a discernible hierarchy to it – from regional growth nodes to neighborhood centers or even block-level public spaces. At the largest scale, centers may be conceived as regionally interconnected “urban cores,” with higher intensity growth converging at transportation corridors. At the neighborhood level, nodes support sustainable urban form by providing public space around which buildings are organized.

4

Implementing Sustainability Goals

While there is wide agreement among urban planners on the principles outlined above, it is during implementation of these ideals that tensions about sustainability are exposed. In fact there is plenty to debate, and those debates frame some of the most interesting aspects of sustainability in planning. Debates about sustainability in planning are often a matter of degrees. Studies of the connection between, say, urban form and travel behavior, or between urban form and health, may admonish planners for failing to see the full complexities involved in linking particular forms to sustainability outcomes, but they are unlikely to call for the wholesale reversal of the basic idea that compact, diverse, walkable urbanism supports sustainability. Between business-as-usual sprawl development of real estate product types and urban planners seeking compact urbanism, there is a significant divide. Debates on the planning side center on how much compactness, walkability, and diversity of land use, not whether compactness and diversity are important goals (Bay and Lehmann 2017). While it is entirely possible to discuss the level of social sustainability – in terms of equity, justice, and social capital – that might be impacted by alternative urban forms, these too are a matter of degrees (Ancell and ThompsonFawcett 2008). One source of debate is over the degree to which partial success in implementation is better or worse than doing nothing. Ancell and Thompson-Fawcett raise a legitimate question about whether intensifying parts of a city make a city more sustainable: “if this results in diminished opportunities for lower-income groups to live in the central city, is such intensification necessary or sufficient as a basis for social sustainability with respect to planning for housing?” (2005, p. 427). One response is that this might be more a matter of failed implementation than of failed principle. But what might be more elusive and interesting is not the degrees to which different sustainable urban forms achieve their desired purpose, and not whether

Urban and Regional Sustainability

1569

implementation is actually occurring – although both of these questions are critical and continue to engage researchers – but whether there are debates that are even more fundamental. There are two debates in particular: process vs. form and city vs. nature. The first concerns the tension between flexibility and open-ended process vs. preconceived forms and concrete visions expressed as ideal models of urban form. In the book The Original Green by Steven Mouzon (2009), the argument is made that cities, to be sustainable, must embody a number of specific design principles, from walkable streets, to preservation of the embodied energy of historic buildings, to design that encourages the use of public space and the civic interaction that results. But for some, the specific design qualities needed to enhance sustainability at this level should be more open-ended and flexible – not normative. In lieu of the normative approach of new urbanists, it can be argued that sustainability in planning amounts to managing change as a continuous process. This was the approach of urban planner Kevin Lynch, who wrote in the book Good City Form (1981): “The good city is one in which the continuity of this complex ecology is maintained while progressive change is permitted. The fundamental good is the continuous development of the individual or the small group and their culture: a process of becoming more complex, more richly connected, more competent, acquiring and realizing new powers - intellectual, emotional, social, and physical” (Lynch 1981, p. 116). Planners and designers who agree with this view are against “a steady state with respect to human-environment relations” and instead devote their energies to promoting the best possible process. Process and form are not necessarily in opposition. New urbanists would contend that both are needed, relying on the charrette process to implement their model of sustainable urban form. But many planners, while they would agree with the basic outlines of what a sustainable city looks like, are more interested in ensuring a sustainable process for getting there. Phil Berke summed up what the authors in a special issue on green communities in the Journal of the American Planning Association called for: “collaborative planning processes aimed at strengthening and mobilizing social networks to support green community initiatives, requirements and incentives that stimulate greener community and household behaviors, and new assessment tools for green building rating, and greenhouse gas inventory and analysis” (Berke 2008, p. 393), in other words, process, procedure, and assessment rather than specific design ideals. Largely this entails prioritizing resident views: “it is important to avoid undertaking research with pre-conceived notions as to whether impacts of urban compaction such as smaller houses are negative or positive, and instead to let the residents speak for themselves” (Ancell and Thompson-Fawcett 2008, p. 440). Others have argued that the sustainable city is being thwarted by “unresponsive bureaucratic procedures” that are ill prepared to deal with the reality that “sustainable development is political rather than analytical” and that overly pragmatic policy solutions (i.e., urban design ideals) might forever frustrate the value-laden complexity of sustainability (Batty 2006, p. 38). In the architecture field, models of sustainable urban form that appear to be universalist (such as those of the new urbanists) are rejected. Architects are

1570

E. Talen

especially “skeptical of the assumption that a single approach, model, or list of best practices can be universally applied,” arguing instead for a “much needed transdisciplinary conversation to emphasize the long-term consequences of our actions, not their ideological or disciplinary purity” (Pragmatic Sustainability: Theoretical and Practical Tools (http://www.routledge.com/books/search.asp). An even more fundamental debate concerns the relationship between cities and nature, which can be described as the “human vs. nature duality.” There is a long history to this in urban planning, and the focus on sustainability has not escaped it. It comes from the regionalism of early twentieth-century botanist Patrick Geddes, who viewed metropolitan development as dependent upon knowledge of the large-scale, regional complexities of the landscape and the human response to that landscape. However, early regionalists believed no synthesis between existing metropolitan development and nature was possible. This imbalance came to epitomize the view that large metropolitan areas were the antithesis of environmental conservation. Historian William Cronon explored the phenomenon of separating human and natural worlds in the book Uncommon Ground: Rethinking the Human Place in Nature (1996). He argued that wilderness, the “ideological underpinning” of the environmentalist movement, is a highly problematic concept because it is viewed as something wholly separate from ourselves. Even the opening line of the Union of Concerned Scientists’ Warning to Humanity (1992) included the premise of separation; it begins: “Human beings and the natural world are on a collision course.” What may be the most lucid example of human/nature duality in planning is the way in which the “greening” of human places is interpreted as something unilaterally positive for the environment, regardless of broader impacts. There may be a failure to recognize that metropolitan development patterns that appear “natural” in the suburban landscape actually disrupt natural systems. In fact, maintaining green spaces may be harmful both in direct ways (through soil compaction, irrigation, and the need for chemical treatment), as well as in indirect ways – increasing atmospheric pollution through increased automobile use caused by spreading out the urban pattern. In short, interweaving green spaces through human settlement may sometimes be more harmful than not when viewed at a larger scale. Somewhat ironically, the most environmentally sound pattern of human settlement – in some cases – may be the one with lower rather than higher levels of green space. This tension between cities and nature has been identified by Godschalk and others as the “green cities conflict.” It is essentially a conflict over the degree to which natural vs. human connectivity is to be prioritized. New Urbanism has been criticized for failing to accommodate more environmental sensitivity (Berke 2002), and this often boils down to their focus on maintaining urban connectivity. But new urbanists counter that environmental regulations may inadvertently thwart compact urban development, including suburban retrofits. They point to a potential problem with low impact development, which is attempting to replace the old stormwater system approach of “pave, pipe, and dump engineering” with something more eco-friendly. The old system resulted in high runoff rate, volume, and pollutant loads and needed to be changed. But new stormwater regulations, currently being advocated by the US Environmental Protection Agency (EPA), might actually

Urban and Regional Sustainability

1571

incentivize sprawl and reduce retrofits by making each site “emulate the natural hydrologic conditions of the site.” Since this becomes much easier in greenfield sites but is onerous on redeveloped sites, the urbanization of existing places may ultimately be avoided. An alternative to the low impact development approach has been proposed in the Light Imprint Handbook: Integrating sustainability and community design (http:// www.lightimprint.org). The book lays out environmentally friendly stormwater management and includes 60 techniques for “paving streets and walkways, channeling and storing water, and filtering surface runoff before release into the aquifer.” It is described as “an intrinsically green design strategy” that not only sustains compact urban places but also “respects site terrain.” The approach is based on the idea of an urban-to-rural transect as a way of maintaining the proper interconnections among urban elements – a balanced mix of landscape, building type, and streetscape, for example. How should compact urban development integrate with green infrastructure practice? Projects that are greenfield rather than infill, disconnected from existing infrastructure, and not particularly concerned with storm water runoff and the restoration of wetlands are problematic. But it is also true that environmental “Best Management Practices” and “low impact development” can result in a lack of urban connectivity, undermining the ability to develop walkable, diverse, compact places – sustainable urban form.

5

Conclusion

There is an interesting overlap between the two debates discussed above: the need to balance process vs. form and the need to integrate city vs. nature may ultimately converge in our approach to sustainable urban planning. For example, it could be argued that creating visually explicit models of future development and providing an inclusive process are both needed to help resolve the human vs. nature duality problem. Perhaps development that is represented tangibly (as compact urban neighborhoods) can help overcome human/nature duality by helping people visualize how development that meets human needs can also protect natural areas. Recognizing that urban development is not a zero-sum game with trade-offs between social and environmental goods, normative visions of development could be used to help illustrate the possibilities. And an inclusive process that allows flexibility and the exploration of alternative proposals is needed not only to ensure that development actually addresses human/nature integration but that it does so in a way that makes sense to people. The definition of the sustainable city and region is, in principle, resolved. The question is how to get there: via a reliance on the right process that guides city building toward a more sustainable outcome or via a stronger articulation of what the sustainable city and region is supposed to be? Via an urban development approach that prioritizes natural systems, or one that allows natural systems to be trumped in some cases in order to promote urban connectivity and compactness? City and

1572

E. Talen

regional planners need to find a balance between visualized ideals and inclusive process and between the unequivocal protection of nature and the corresponding human claim to land development. In both cases, there is a need to bring the language of integration into sharper focus. In sustainability, actions are supposed to balance natural, economic, and social concerns. Sustainability challenges us to make every decision supportive, and integrative, of each realm.

6

Cross-References

▶ Cities, Knowledge, and Innovation ▶ Climate Change and Regional Impacts ▶ Infrastructure and Regional Economic Growth ▶ New Trends in Regional Policy: Place-based Component and Structural Policies ▶ The Case for Regional Policy: Design and Evaluation

References Agyeman J (2005) Sustainable communities and the challenge of environmental justice. New York University Press, New York, pp 79–106 Ancell S, Thompson-Fawcett M (2008) The social sustainability of medium density housing: a conceptual model and Christchurch case study. Hous Stud 23(3):423–441 Batty SE (2006) Planning for sustainable development in Britain: a pragmatic approach. Town Plan Rev 77(1):29–40 Bay J, Lehmann S (eds) (2017) Growing compact. Routledge, London. https://doi.org/10.4324/ 9781315563831 Berke P (2002) Does sustainable development offer a new direction for planning? Challenges for the twenty first century. J Plan Lit 17(1):22–36 Berke PR (2008) The Evolution of Green Community Planning, Scholarship, and Practice. Journal of the American Planning Association. 74(4):393–408 Berke P (2016) Twenty years after Campbell’s vision: have we achieved more sustainable cities? J Am Plan Assoc 82(4):380–382. https://doi.org/10.1080/01944363.2016.1214539 Berke P, Manta-Conroy M (2000) Are we planning for sustainable development? An evaluation of 30 comprehensive plans. J Am Plan Assoc 66(1):21–34 Bibri SE, Krogstie J (2017) Smart sustainable cities of the future: an extensive interdisciplinary literature review. Sustain Cities Soc 31:183–212. https://doi.org/10.1016/j.scs.2017.02.016 Calthorpe P, Fulton W (2001) The regional city: planning for the end of sprawl. Island Press, Washington, DC Campbell S (1996) Green cities, growing cities, just cities? Urban planning and the contradictions of sustainable development. J Am Plan Assoc 62(3):296–312 Campbell SD (2016) The Planner’s triangle revisited: sustainability and the evolution of a planning ideal that can’t stand still. J Am Plan Assoc 82(4):388–397. https://doi.org/10.1080/ 01944363.2016.1214080 Childers DL, Pickett STA, Grove JM, Ogden L, Whitmer A (2014) Advancing urban sustainability theory and action: challenges and opportunities. Landsc Urban Plan 125:320–328. https://doi. org/10.1016/j.landurbplan.2014.01.022 Clifton K, Ewing R, Knaap GJ, Song Y (2008) Quantitative analysis of urban form: a multidisciplinary review. J Urban 1(1):17–45

Urban and Regional Sustainability

1573

Cronon W (ed) (1996) Uncommon ground: rethinking the human place in nature. W.W. Norton & Co., New York Ewing R, Cervero R (2017) “Does compact development make people drive less?” the answer is yes. J Am Plan Assoc 83(1):19–25. https://doi.org/10.1080/01944363.2016.1245112 Farr D (2008) Sustainable Urbanism: Urban Design with Nature. Hoboken, N.J: Wiley Farr D (2018) Sustainable nation: urban design patterns for the future. Wiley, New York Fu Y, Zhang X (2017) Trajectory of urban sustainability concepts: a 35-year bibliometric analysis. Cities 60:113–123. https://doi.org/10.1016/j.cities.2016.08.003 Hall P (2002) Cities of tomorrow: an intellectual history of urban planning and design in the twentieth century, 3rd edn. Basil Blackwell, Oxford Jacobs J (1961) The death and life of great American cities. Vintage Books, New York Lynch K (1981) Good city form. MIT Press, Cambridge, MA Merchant C (1996) Reinventing Eden: Western culture as a recovery narrative. In William Cronon, Ed., Uncommon Ground: Rethinking the Human Place in Nature. New York: W.W. Norton & Co. pp. 132–170. Mouzon SA (2009) The original green: unlocking the mystery of true sustainability. The New Urban Guild Foundation, Miami Owen D (2009) Green Metropolis: why living smaller, living closer, and driving less are the keys to sustainability. Riverhead Hardcover, New York Romaya S, Rakodi C (2002) Building sustainable urban settlements: approaches and case studies in the developing world. ITDG Publishing, London Sampson R (2013) When disaster strikes, it’s survival of the sociable. New Sci 218(2916):28–29 Stone B Jr (2012) The city and the coming climate: climate change and the places we live. Cambridge University Press, New York Union of Concerned Scientists (1992) World scientists’ warning to humanity. Union of Concerned Scientists, Cambridge, MA. Available at: http://www.ucsusa.org/ United Nations (2017) Resolution adopted by the General Assembly on 23 December 2016: 71/256, “New Urban Agenda”. United Nations, New York World Commission on Environment and Development (the Brundtland Commission) (1987) Our common future. Oxford University Press, Oxford

Population and the Environment Jill L. Findeis and Shadayen Pervez

Contents 1 2 3 4

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Century of Dramatic Population Growth and Population-Environment Theories . . . . . . . . Regional Differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Micro Perspective on Population Decisions and Community Resilience and High-Level Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1576 1578 1583 1585 1591 1592 1592

Abstract

The impact of human population growth on the environment represents the major challenge of our time. This chapter reviews demographic change over the last century, set in historical context, and different perspectives on populationenvironment interactions. Differences in population growth rates and demographic change across space are explored, followed by perspectives on the population-environment nexus at multiple scales with a particular focus on those contexts where impacts are likely to be the very greatest on humankind. The alignment of individual and higher-level actions resulting in environmental impacts and the negative force of the impacts of actions are important, to signal the need to change behaviors. Interrelationships are shown to be highly complex. It is argued that multidisciplinary efforts to tackle complexity and to focus on resilience at multiple scales are critically needed, with the importance of multidisciplinary regional science thought being underscored. The question is raised, however, whether these efforts will be coordinated well enough across multiple scales and with multiple disciplines and publics to avoid what could be J. L. Findeis (*) · S. Pervez Division of Applied Social Sciences, University of Missouri-Columbia, Population Research Institute, Pennsylvania State University, Columbia, MO, USA e-mail: fi[email protected]; [email protected]; [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_58

1575

1576

J. L. Findeis and S. Pervez

catastrophic impacts. These are most likely to occur at local and regional scales where population growth rates are high, natural environments already vulnerable, and resilience limited. Keywords

Population growth · Population growth rate · Vicious cycle · Environmental Kuznets curve · Human population growth

1

Introduction

Vitousek et al. (1997), among many other scholars, convincingly argue that the impact of human population growth on the environment represents the major challenge of our time. Increasingly, humans dominate the landscape, driven by population growth, its distribution and affluence. Population growth rates over the last century are unprecedented, stemming from multiple factors including higher survival rates of the young and old, and falling birth rates. World population in 2018 is estimated at 7.6 billion (PRB 2018). Population Reference Bureau (PRB 2010) projections to 2050 indicate a doubling of the world’s population from 2010 levels in the least-developed countries (i.e., 2.0 times more), 1.6 times more in the lessdeveloped countries excluding China, 1.4 times more in the less-developed countries with China included, and 1.1 times more in more-developed countries. Simultaneously, economic growth and development has been attained in many regions of the world previously considered – even 50 years ago – as almost endlessly trapped in poverty. The physical environment, which historically kept human population growth in check, has been altered to the point that many question whether and what individual and collective human actions are needed to exert the impact that the environment previously – and harshly – did. The tables are turned; human population growth, expected to continue over the next century, threatens the natural environment even as overall rates of population growth are projected to decline. Caveats are in order. The long-term perspective is important, as documented by historian J. R. McNeill (2006). And the epidemiological and agricultural transitions that contributed to higher survival rates are known to be reversible. However, Campbell (2007) challenges us when she observes that recent literature across multiple disciplines remains amazingly silent on the population growth issue. In part this is a result of conflict over the role of family planning. What is clear is that the complexity of the population-environment nexus and underlying mechanisms are major challenges for scientists across a wide spectrum of disciplines. Ehrlich and Holdren (1971) summarized the complexity of the population-environment nexus in the famous I = PAT equation. Here, the aggregate environmental impact (I) is the product of the size of the population (P), per capita consumption (A for affluence level), and the environmental impact of a unit of consumption determined by level of technology T. The basic assumption is that environmental

Population and the Environment

1577

impacts, whether level of pollution or depletion of the natural resource base, are functions of the total demand for goods and services in the economy. Technological improvement implies discovery of ways to substitute natural resources among one another and to substitute ideas and manufactured capital for natural resources. This enables higher levels of production and consumption even without making higher demands on the natural resource base. IPAT is evidently too simplistic: it does not account for the interactions among the variables. For instance, technology can be more efficient as the level of affluence increases. Another serious shortcoming is that it specifies impact as a linear relationship ignoring the very important threshold effects. But despite these simplifications, IPAT points out the ruthless logic of the well-known Malthusian framework: the ever-growing population and level of affluence will continue to have a greater toll on the earth’s natural resource base and ecosystem. To what extent technological progress can stretch the eventual limit of the earth’s carrying capacity is a source of continuing debate, but the simple IPAT equation underscores the apparent problem of assuming an infinitely elastic limit. Viewed thus from a Malthusian perspective, IPAT points to an impending “population problem.” Scientists from different disciplines have different ideas on why the problem exists. For example, for economists, one challenge is to find out why the problem at the aggregate level cannot be solved at the level of individual rational actors. Given the constraint on the natural resource base, rational actors should adjust their reproductive and consumption behaviors so that the optimum levels of population and consumption can be perpetually maintained. Of course, this does not happen in reality in part because the private and social benefits and costs of reproduction are not the same. Reproduction generates crowding externalities for others, which the reproducing individuals acting on their own are not expected to take into account. One likely source of these crowding externalities is the obvious finiteness of space. Thus, crowding externalities provide population-environment interaction a spatial perspective, making it of paramount interest to every regional scientist. The overall population growth trend is coupled with an apparent unevenness in population growth rates across regions. The increases in population projected in the literature for some countries are quite astounding. As documented by the Population Reference Bureau (2018), countries with the largest projected absolute increases in population between 2018 and 2050 include India (308.8 million), Nigeria (214.7 million), and the Democratic Republic of the Congo (131.6 million), as examples. At the same time, other countries are expected to experience lower populations. Examples of countries with projected population declines over this period include China, Japan, Russia, Ukraine, Romania, Poland, Germany and Thailand (PRB 2018). Focusing on fertility rates, the Population Reference Bureau in 2018 reported that the total fertility rate (TFR) in more developed countries stands at a 1.6 TFR (average number of children born per woman over her lifetime). This compares to a total fertility rate of 4.2 children across the least developed countries of the world. The PRB (2018) reports the highest 2018 total fertility rates for Niger (7.2), followed by Chad (6.4) and the Democratic Republic of the Congo (6.3). The estimated population replacement TFR is 2.1, for comparison.

1578

J. L. Findeis and S. Pervez

This unevenness has already led to human suffering in place. It has led to forces pushing internal migration to find food and water, and often safety. Transnational and trans-regional migration is common. While the pull of better opportunities elsewhere is a major incentive, higher than average population growth rates in already resource-poor regions of the world set in motion forces that also push population out of homelands, putting additional pressure on regions where resources are more abundant and/or native populations are declining. The developed countries aren’t off the hook in terms of the impacts of their own behaviors on the natural environment. Over the past half century at least, the trend toward higher levels of consumption is an important driver of environmental degradation. The ability to have more puts stress on the natural environment not only in the developed world but also in developing countries and regions, often important suppliers of specific natural resources. Further, imitation within developing countries can be an important driver of innovation. Recent demographic trends raise a host of challenges to scientists and to the world’s public: how to efficiently and equitably feed the human population concentrated in urban centers or the elderly who are becoming a larger proportion of the rural population; how to maintain human health and wellbeing in these places in the long run; how to sustain the natural resource base in populated spaces; and how to design appropriate policies to balance social, economic, and environmental goals. Addressing these challenges requires understanding the complexity of the human system and larger ecosystem, which underlie the population-induced transition process. A study of recent debates in the literature on the interrelationships among human population growth, economic growth and development, and environmental impacts will convince most that the underlying system is complex, requiring the cooperative problem-solving that some believe humans may be genetically programmed to do best. This chapter reviews selected works from multiple literatures contributing to disciplinary, multidisciplinary, interdisciplinary, and transdisciplinary efforts to understand and solve the complex challenges posed by the population-environment problem. After reviewing the historical context and documenting population growth and distribution trends, literature at different scales – micro, meso, and macro – is reviewed. The underlying question to be answered by future researchers is the following: Can a balance between humans and the environment be struck to attain a sustainable state? And, importantly, how can regional scientists contribute to the process of solving the population-environment challenge?

2

Century of Dramatic Population Growth and PopulationEnvironment Theories

The “Malthusian trap” has fortunately not (yet) been realized. In An Essay on the Principle of Population as It Affects the Future Improvement of Society, Malthus (1798) argued that the rate of growth of population would outstrip growth in food

Population and the Environment

1579

production, the latter being constrained by increasingly scarce land resources. He believed that human fertility could not be curtailed, that is, humans would reproduce without restraint, although Malthus later softened his stance. Birdsall (1988) observes that while population growth happened in the eighteenth and nineteenth centuries, growth was slow, “seldom exceeding one percent a year” (Birdsall 1988, p. 478). From 1750 to the twentieth century, the overall population growth rate averaged 0.5% per year, with rates higher in what are now the moredeveloped countries. While couples likely controlled their own fertility to a greater extent than we perceive, “environmental conditions” (notably disease) limited growth – to be young puts you at peril, and relatively fewer lived to be “elderly” as we know it today. It is important to emphasize at this juncture that the Malthusian emphasis on the “direct race” between population growth and food growth fails to uncover the underlying forces behind the mechanism at work. Production after all depends on the level of employment, not the level of population. But there is not a one-to-one mapping either between population growth and employment growth or between employment growth and food production growth. Over the last two and a half centuries, changes in fertility and life expectancy obviously have changed the labor force participation rate or the employment level. Technological change also has boosted agricultural productivity, yielding higher levels of food production for the same level of employment. Higher aggregate income may not result in proportionate food growth. Further, growth in the food sector crucially depends on the distribution of income. Technological change is again instrumental in the determination of factor rewards and income distribution. Thus, the determining factor behind the Malthusian thesis is the extent to which the forces of technological change affect the key relationship between population and food growth. Extended Life Spans, the Epidemiological Transition and Voices of Concern. Economic growth and development and technology, in particular, extended life spans in the years following World War II. The “epidemiological transition” was built on advances in public health through disease prevention (e.g., immunization, improved sanitation) and effective forms of disease control (e.g., antibiotics). Technological advances in agriculture contributed to higher yields and greater food security in many of the world’s regions and to arguably better nutritional status. Significantly lower mortality rates emerged, which when coupled with continued high fertility rates, resulted in population growth rates of 2.4% per year in the 1960s (Birdsall 1988). High population growth rates were especially concentrated in the less-developed countries; Europe had less than a 1% rate of growth and North America exceeded “1.5 % only briefly” (Birdsall 1988, p. 479). Japan’s rates also were low. However, the higher than replacement rates documented in almost every country of the world raised significant concern. As early as 1953, the United Nations published The Determinants and Consequences of Population Trends, and Coale and Hoover (1958) followed with Population Growth and Economic Development in Low-Income Countries. Human population growth coupled with economic growth and development over this time period began to raise questions about the

1580

J. L. Findeis and S. Pervez

ability of the earth’s natural resources – the so-called Malthusian “flower pot” – to sustain the unprecedented growth. The voice of concern became even louder in the 1960s and early 1970s, when academics from multiple disciplines focused energy and public debate on two alarming trends – emergence of signs of environmental degradation at multiple scales worldwide tied to ever more population coupled with greater affluence in some regions. The Population Bomb (1968) credited to Paul Erlich (and Anne Erlich); The Limits to Growth commissioned by the Club of Rome and authored by Donella Meadows, Dennis Meadows, Jorgen Randers, and William Behrens III; and Rachel Carson’s earlier (1962) Silent Spring, among other published works, raised public awareness and alarm. Publication of the National Academy of Sciences’ The Growth of World Population in 1963 further contributed to understanding by the public and academic community. The Limits to Growth and World Dynamics by Jay Forrester are exercises in system dynamics and cutting-edge computer simulation that gave those works an aura of scientific precision and increased respectability. The simultaneous growth in population and affluence triggered the Neo-Malthusian alarm. The essential logic rests on the imperfect substitutability between natural and man-made resources. Given imperfect substitutability, if land and natural resources remain fixed, growth in labor and physical capital cannot sustain output growth forever; it is only time before diminishing marginal returns to labor set in and per capita production and consumption begin declining. The Limits to Growth emphasizes this point by asserting that limits to arable land will soon be reached and per capita consumption will decline, leading to famine. Even if a food crisis is avoided, the demand for industrial output will exhaust the earth’s mineral resources; the growing pollution resulting from industrial output will generate pollution levels beyond the earth’s exhaustive capacity, eventually leading to catastrophic collapse. Neoclassical and Cornucopian Thought. In a devastating but pointed criticism, Nordhaus (1973) dismissed the models in World Dynamics and The Limits to Growth as devoid of empirical content. The predictions of these models are highly sensitive to specifications. He objected that the models that had been constructed were based (merely) on subjective assumptions and were not reconciled to the real world. Thus, if the assumptions did not hold, the results indicated by the models would not hold either. The dire Neo-Malthusian predictions stem from simplifying assumptions that can indeed be challenged. Natural resources may not be fixed: new natural resources are being discovered through relentless exploration, although even here limits to growth could eventually become binding. But within a well-functioning market system, the prices of scarcer resources will rise, providing incentives to substitute for them. The work of Ester Boserup demonstrated that the post-industrial revolution in agricultural production occurred side by side with population growth as a result of the greater substitution of capital and labor for natural resources. The Boserup hypothesis is thus the widely known claim that population growth triggers production increases through intensification – greater use of capital and labor instead of land.

Population and the Environment

1581

From this perspective, the key barriers to reaching an optimal production level are either market failures (such as insecure property rights, pervasive externalities) or policy distortions or both. The discovery of ways to substitute abundant resources for scarcer ones is itself an important form of technological improvement. Technological innovation responds to incentives in the market. The propensity of technological innovation to respond to incentives had been emphasized by Julian Simon in The Ultimate Resource (1981) and Theory of Population and Economic Growth (1986), as well as other works. Simon argues that since human ingenuity to discover new ideas is the ultimate resource, population growth is the greatest boon. More people mean more ideas, that is, a greater number ways to substitute and circumvent resource constraints. Larger population also means a greater size of the market, greater specialization, and greater per capita productivity. Thus, as opposed to the future of the Malthusians, Simon’s ideas prognosticate a rosy future of material abundance – hence the name cornucopian theory. The development of Nobelist Paul Romer’s New Growth Theories provides impetus to the cornucopian school. Investment in research and development yields increasing returns to scale because the benefits of ideas do not diminish through sharing. Yet the logical corollary to cornucopian thought is the claim that infinite growth in output makes no more than a finite demand on the natural resource base. This claim is famously refuted by stating that no degree of production intensification in a flower pot can potentially grow enough to feed the entire world. Quelling some concern, population growth rates in almost all developed countries declined to replacement rates in the 1970s, although rates in the developing world remained high. While in part in response to growing public discourse on The Population Bomb, The Limits to Growth, and Silent Spring (among others), more importantly (and pragmatically) this decline was in response to the advent of widespread availability of more effective fertility control and the return of women to paid jobs. This occurred regardless of the income level of the country; Birdsall (1988) makes the compelling case that while countries with higher incomes tend to have lower rates of fertility and mortality, many countries with individual and household low incomes have achieved lower fertility and mortality rates too. Lower rates were achieved through advances in women’s status and greater access to modern fertility control, education, and health services. Two forces diffusing the concern at this stage are the following: (i) the second demographic transition, the gradual decline of fertility approaching replacement level, and (ii) the environmental Kuznets curve (EKC), the empirical observation that environmental quality appears to initially deteriorate with economic growth but then improves as income growth continues. These two phenomena working together mitigated some of the alarm of impending catastrophe and modified public and academic discourse on the population-environment problem. Second Demographic Transition and the EKC. For most of human history, the rise of per capita income had a positive effect on population growth. Population growth in turn caused diminishing marginal labor productivity to set in, and the Malthusian check eventually reduced per capita income to near subsistence levels.

1582

J. L. Findeis and S. Pervez

Low mortality with continued high fertility, a phenomenon called the first demographic transition, similarly could have a diluting effect on income per capita. However, this dilution of per capita income was counteracted by the acceleration in technological progress and capital accumulation. An important development has been the crucial role of human capital. Galor (2005) argues that further acceleration in the rate of technological progress increased the demand for human capital in the second phase of the industrial revolution, inducing parents to invest in their children’s human capital. The rise in life expectancy also increased the rate of return to investment in human capital. While human capital investment increased productivity and the opportunity cost of time, advances in household production technology, the introduction of (more) fertility control technologies, and changes in gender norms and the institution of marriage gave women greater control over reproduction. The net effect has been Gary Becker’s quantity-quality trade-off in fertility choice. Thus, the second demographic transition of the post-Malthusian epoch began where sustained economic growth coincided with the simultaneous decline in fertility rates. Further, it was observed that since the 1990s, the relationship between environmental degradation and per capita income has exhibited an inverted-U shape (Grossman and Krueger 1995). Analogous to the Kuznets curve between per capita income and income inequality, this relationship is aptly named the environmental Kuznets curve (EKC). The primary reason behind the EKC is perhaps structural change. Early stages of economic growth are driven by industrialization – the growth in manufacturing giving rise to increases in the rate of pollution. Further, economic growth increases the share of the service sector, which is less harmful to the environment. Also, the income effect on demand for environmental quality is nonlinear – as per capita income crosses a certain threshold – consumers demand environmental quality at a much higher rate. This creates the incentive for even the manufacturing sector to employ cleaner technology. Empirically, the EKC relationship has been estimated, for example, with GEMS’ (Global Environmental Monitoring Systems) emissions data as measures of environmental degradation. Recently, the EKC relationship has been questioned by Carson (2010) who provides empirical evidence to the contrary. Recent Challenges. Many have challenged the cornucopian school’s optimism and again raise significant concern over lasting and irreversible environmental damage and societal and ecosystem collapse (see contributions of Jared Diamond). The academic and public dialogue is heated but healthy. Unevenness of where population growth is taking place is creating very substantial gaps among the world’s major regions; serious gaps in food security, access to water, and access to energy resources are well documented. At the same time, Nobelist Elinor Ostrom’s wellknown research shows that local systems can adapt positively to change, even for shared (common) resources; that is, adaptation to preserve the local environment is possible even in light of population growth. Whether institutions of governance at higher levels can reduce transboundary externalities (e.g., the “borderless” problems stemming from SO 2 emissions) or can be designed to do so remain critical concerns. These issues are certainly receiving much attention.

Population and the Environment

1583

Meanwhile, global population has continued to expand and expand rapidly, although the rate of growth is finally declining. Projections to 2100 by IIASA (International Institute for Applied Systems Analysis) indicate that maximum global population will be attained in this century. As argued convincingly by Scherbov et al. (2011), uncertainty, in part stemming from “unreliable statistical information” from many areas of the world but also the inherent uncertainty of future fertility, mortality, and migration rates, casts doubt on when overall and region-specific maximums will be attained. However, they conclude that “there is little uncertainty that world population will peak and start to decline before the end of the century” (Scherbov et al. 2011, p. 575).

3

Regional Differentials

The demographic transition represents a worldwide trend (UNFPA 2018). At the same time, large regional differentials in population growth rates exist, creating regional variations in human and environmental conditions and in knowledge/ perceptions of the extent and character of environmental degradation. Affluence varies widely by region and within regions. The well-known contributions of Nobelist Paul Krugman and others have contributed key insights into why differentials exist and persist. Population Reference Bureau projections to 2050 show that the populations of 26 countries, almost all in Africa, will at least double by 2050 (PRB 2018). As an example, Nigeria – now the seventh most populous country – is predicted to become the third most populous nation in the world. The UN publication, World Fertility Patterns, finds that over the 2010–2015 period, Africa’s total fertility rate (TFR) was 4.7 births. In comparison, Asia’s total fertility rate was 2.2, Latin America and the Caribbean (2.2), North America (1.9), and Europe (1.6). Within these broad regional categorizations, there are important variations (see UN 2015, p. 3). By 2018, “Of 43 parts of the world with fertility of four or more births per women today, 38 were in Africa” (UNFDP 2018, p. 28). In contrast, 38 countries including China, are predicted to experience population declines by 2050. India’s population is predicted to pass that of China over this time period (PRB 2018). Importantly, within countries and regions there are very often pockets of populations with higher than average total fertility rates (typically rural) and pockets characterized by lower total fertility rates (typically urban). Population aging, a shift in the global population toward an older age structure, is expected to continue to accelerate (PRB 2018). The PRB (2018, p. 2) observes that “Older adults’ (age 65+) share of the global population increased from 5 percent in 1960 to 9 percent in 2018 and is projected to rise to 16 percent by 2050, with the segment ages 85 and older growing the fastest. Children’s (ages 0–14) share is falling, from 37 percent in 1960, to 26 percent in 2018, with a projected decrease to 21 percent by 2050. The timing and speed of age structure changes vary by country, and these changes have important social and economic implications.”

1584

J. L. Findeis and S. Pervez

Worldwide, convergence in the average length of life is documented (see Edwards 2011), although country exceptions exist. The challenge of HIV/AIDS clearly has had an impact on life spans. Edwards (2011) reports that overall life expectancy in the developing world has grown by an additional 0.24 years annually as compared to the wealthier countries, more than doubling the rate in the developed world (Edwards 2011, p. 499). Research on within-country variation in life expectancy at birth and other survival measures underscore the underlying variability over time. Interestingly, convergence in expected length of life at birth is in contrast to divergence in income per capita (see studies referenced in Edwards 2011). Finally, structures and preferences of household units have changed, in some instances putting pressure on ecosystems. For example, University of Michigan researcher Jianguo ‘Jack’ Liu showed that in rural China, the traditional extended household structure – with multiple families living together in the same house – is now a less-preferred lifestyle. Thus, more houses are built to accommodate the greater number of families preferring to live on their own. The research shows that this demographic change pressures Panda habitat. This represents just one of many possible examples. Correlations Among Multiple Indicators. Overlaying maps of regional indicators of population density, population growth rates, levels of economic development, and conditions of the physical environment raises as many questions as this exercise answers. Population density and level of income fail to show a clear relationship across the earth’s regions: some regions are characterized by high population densities and are poor, whereas others are dense and wealthy. Some sparsely populated regions are rich and others poor. However, studies show that the highest population growth rates tend to occur where economic growth has been least likely (Gallup et al. 1999). Resources in the less- or least-developed regions will be under intensified pressure as these regions cope with a larger share of the world’s population. That current population growth rates are particularly high in tropical regions is a not surprising result given recent advances in disease control and rapid adoption and diffusion of immunization, antibiotics, and other health-related technologies across the tropics. Historically, higher disease burdens, and especially a broad range of infectious diseases, have been typical. Also in the tropics, food security and agricultural productivity have been lower, given old soils (such as the soils of East Africa), human-degraded soils, higher insect pest burdens in agriculture, and high rainfall variability, among other factors. Gallup et al. (1999) observe the following when comparing worldwide population densities, population growth rates, economic development, and underlying geography: (a) Tropical zones are “not conducive to [economic] growth,” due to presence of malaria and other diseases (Gallup et al. 1999, p. 204). But prevalence of malaria is positively correlated with population density (Gallup et al. 1999, p. 209). (b) High population density appears to be positive to (economic) growth in coastal regions but “inimical to growth in the hinterland” (Gallup et al. 1999, p. 204).

Population and the Environment

1585

(c) Geography and policy “matter,” but good policy will not be enough to counter geographical disadvantage (Gallup et al. 1999, p. 204). In short, some of what are considered the most vulnerable places on earth are inhabited by populations with high population growth rates and lower chances of developing economically for reasons largely stemming from their natural environment and geography. McNeill’s (2006) assessment of challenges related to land use, water use, and air pollution clearly points to the pervasive issue of the distribution of resources, regional shortages, and implications for the physical environment. The upward pressure of strong population growth against an already weak natural resource base encourages a variety of responses, including the potential for multiple responses: scavenging and other survival behaviors, out-migration (Zelinsky 1971), evolution toward greater agricultural intensification, urbanization, reductions in fertility (see work of Graeme Hugo), and an array of other adaptive behaviors. Human mobility or migration in response to population change is recognized as a complex phenomenon, not easily explained. But differentials between moredeveloped and less-developed regions likely contribute to both “push” and “pull” incentives for internal and international migration flows. The following section concentrates on understanding theory, and adjustments and impacts at the micro level (individuals, households, communities) and macro/ regional (meso) levels. The micro perspective is covered first because populationrelated decisions are typically made at this level and can have local repercussions. The micro is then linked to the macro/regional (meso) perspective since environmental impacts importantly play out at these higher scales; a few examples include impacts of international and rural-urban migration, exurban land use, and the associated problem of sprawl and problems stemming from transboundary pollution. The focus of Sect. 4 is on less- and least-developed places because these places are among the most likely to be widely impacted by population-environment interactions. Such places also face the greatest challenges solving the environment problem through technological change.

4

Micro Perspective on Population Decisions and Community Resilience and High-Level Effects

Micro Perspective. A new micro perspective has emerged in populationenvironment research, especially related to less- and least-developed countries. Three aspects deserve emphasis. First, a primary feature of this perspective is the focus on household-level population dynamics and relationships to environmental change. A second feature is the mediating variables approach: analysis and identification of variables (e.g., poverty, government policies, cultural norms, market or nonmarket institutions) through which population dynamics affect environmental change (Hunter 2000). Mediating variables often reinforce or even reverse the role of population dynamics in environmental degradation or enhancement. Finally, it is recognized that there is more to population dynamics than population size and

1586

J. L. Findeis and S. Pervez

growth. New micro research has gone beyond attribution of environmental degradation to increases in population, to seeking to understand how other population variables – for example, household size, age, sex composition, and migration – affect and in turn are affected by environmental change. At its core, the micro perspective is guided by the belief that observed patterns and trends of population dynamics and associated environmental change can be mapped to the individual household unit which makes the actual decisions about production, consumption, and reproduction. Since the pioneering work of Gary Becker, researchers have sought to identify the determinants of fertility by focusing on the reproductive decisions of individual households. Demographic research in the following half a century has uncovered that apart from the techniques available emphasized by the works of Richard Easterlin, household demand for children strongly influences fertility behavior. Demand for children in turn depends on determinants: income level, parental education, rules of inheritance, and other institutional factors, along with a host of other mediating variables. However, a major difficulty with the focus on household as a decision-making unit is that households in reality are not monolithic. Questioning the so-called “unitary” view of the household making choices to maximize its welfare Dasgupta (2003), among many others, points to ample evidence of gender inequities within poor households; these affect allocation of education, food, healthcare, and other household resources. The unitary household model effectively ignores an important fact: if all benefits and costs are not borne by the decision maker, decisions may not be optimal. Women bear the disproportionate share of the cost of having children including the cost of pregnancy, breastfeeding, daily care, and risk of maternal mortality. Yet, in many traditional societies, men are primary decision makers as to the desired number of children. Thus, the problem of externalities – the situation when the decision maker does not bear all the costs and benefits of the decision – prevents the optimal allocation even inside the household. Outside the household, mediating factors may create population-environment externalities by driving a wedge between the benefits and costs borne by the decision maker. At times, these variables can create a feedback loop that contributes to a downward spiral of resource depletion, poverty, and high fertility. The selfreinforcing patterns of sustained high fertility in the face of declining environmental resources create “vicious cycles.” The key to understanding vicious cycle in the population-poverty interaction is the nonlinear response to the change in decision variables. Poor households depend on their own unskilled household labor, low-productive agricultural land held as private property, and also surrounding environmental resources available as common property. There are thresholds in the use of each of these endowments – the point where the response to the decision variable changes nonlinearly. Vicious Cycles, from Nutritional Status to Common Property Resources. Partha Dasgupta outlined a pioneering vicious cycle model (see Dasgupta (2003) and references therein) where nutritional status has a critical threshold in terms of capacity to work. Below that nutritional status, labor is not productive enough to grow enough food to achieve better nutrition. Thus, the poor nutrition-low

Population and the Environment

1587

productivity state is self-sustaining. Above that threshold, a good nutrition-high productivity state is similarly self-reinforcing. For household private property, there can be a critical threshold of a productive asset (e.g., agricultural land), below which similar poverty traps may exist. The critical threshold in the extraction of common property resources is even more interesting. Since household decisions on the extraction of the common property resource are likely to be affected by expectations of other households’ decisions, the strategic response and peer effects may create vicious cycles and multiple equilibria, as explored in the recent exploding literature across many disciplines on social interactions. Further, the interplay of population dynamics with these critical thresholds can change an existing vicious cycle. For example, fertility can modify Dasgupta’s model on nutritional status and capacity to work. Poor nutritional status leading to higher infant mortality causes high fertility due to the “insurance of birth.” With diminishing returns to labor, high fertility lowers labor productivity, contributing to lower nutritional status and higher mortality rates, further increasing the motivation for higher fertility. On the other hand, with abundant land, if production is within the range of increasing returns (e.g., nineteenth-century US Western frontier), high fertility can reverse the initial vicious cycle model of nutritional status and labor productivity. In a capital-scarce environment, greater labor intensification increases fertility through raising the demand for farm labor. High fertility, in turn, creates further demand for food. Further intensification contributes to land degradation, causing a declining resource base and further poverty. De Sherbinin et al. (2008) describe the literature on the relationship between fertility and the resource base – farm size, cattle and land, water, etc. Consistent with the vicious cycle hypothesis, they find the relationship to be negative. In contrast, some other authors have found the relationship to be positive. The postulated hypothesis is that households with larger land size have a greater demand for children to retain use rights of the land. Common property resources provide another example of vicious cycles. In South Asia and sub-Saharan Africa, rural households lack access to tap water or energy sources. They derive a considerable portion of their livelihood resources (firewood, timber, non-timber forest products, fish, bush meat, water, etc.) from common property natural resources. Households make decisions on childbearing as well as labor allocation and marketing. Apart from the intrinsic benefits, the material benefits of children in household goods production are compared against costs of birthing and child maintenance. When a substantial portion of livelihood comes from common property resources, a household’s share of the common property depends on the number of hands it can employ to convert common property to private property. Empirical studies from Nepal, Pakistan, and South Africa find evidence supporting the positive relationship between fertility and resource dependency. While having a large number of children exploiting the commons is marginally optimal from an individual household’s point of view, it is not optimal for the community. Greater entry into common access resources may lower labor productivity, which households try to offset by adding more hands. This creates incentives for greater household size and higher fertility. But since everyone is doing the same –

1588

J. L. Findeis and S. Pervez

without consideration to the effect of each household adding more hands on the average productivity and sustainability of resources – degradation accelerates. Environmental resources often have a critical threshold point at which the ecological system loses regenerative capacity, a phenomenon which the ecologists call “loss of resilience”. The concept of the resilience has been used in two senses in the environmental science literature. The first one called “engineering resilience” is defined by the time it takes to return to an equilibrium or steady state after the shock. We here refer to “ecological or Hollings resilience” which presumes the existence of multiple regimes or equilibria. The “loss of resilience” in the latter sense means that the ecological system moves to another equilibrium and no longer retains its former function, structure, identity, and feedbacks. Caught in the fertilityenvironmental degradation feedback loop, when competition becomes intense to extract resources before others do, the ecological system would gravitate into the catastrophic equilibrium point of “loss of resilience”. The mediating variables approach is built on recognition that in many regions of the world, the range of choices and trade-offs available to low-income households is affected by the quality and state of the surrounding environment on which their livelihood depends. Thus, informal institutions such as kinship, social norms, and culture and formal institutions (e.g., well-functioning markets for land, labor capital, insurance) are important mediating variables in the population-environment nexus. In sub-Saharan Africa, fosterage within the kinship group has long been viewed as a determinant of high fertility because the costs of responsibility of children diffuse among kinship. On the positive side, Elinor Ostrom shows that traditional communities often protect their local commons from overexploitation by relying on social norms that put restraint on individual community members. Here, social norms may resolve the “coordination problem.” Formal institutions are often thought to have subverted the traditional informal institutions of common resource management. The emergence of modern governments is often believed to have taken away the authority of villages to impose sanctions against those who violate locally instituted norms of use. As social norms degrade, parents pass off some of the cost of raising children to the community by overexploiting the commons. Baland and Platteau (1996) document the cases of several Sahel states, exerting detrimental impacts on traditional resource management practices. Access to formal markets can affect resource degradation both ways. On the one hand, access can increase resource-depleting productive activities due to higher prices for goods. On the other hand, increases in real wages may increase the harvesting cost of common property. Relationship Between Migration and the Local Environment. Research focuses on the relationship between the local environmental condition and household-level decisions to migrate (de Sherbinin et al. 2008), with a focus on growing resource scarcity and ability to access new resources elsewhere. Although more studies are needed to understand the impact of resource scarcity on a household’s decision to migrate, historical examples suggest that scarcity of land resources leads to waves of out-migration to new land. Besides European history, there are more recent instances of core-periphery movement in the developing countries. Examples of core-

Population and the Environment

1589

periphery movement include movement from regions of Brazil and the Ecuadorian Andes to the Amazon. Migration to rural frontiers can give rise to new rural frontiers, utilizing valuable resources in the process; households settled in the first wave of frontier migration use up resources and then send younger members to settle in more distant areas, with the potential for the same pattern repeating itself in the next generation. Empirical research on the relationships between migration and the environment shows mixed results. Henry et al. (2004) show that in Burkina Faso, the risk of migration is higher in villages with unfavorable agroclimatic conditions. They also find that villages with increased water conservation technologies have a higher likelihood of out-migration. These effects are supposed to be more pronounced for short-term migration as a strategy to diversify income sources in risky environments. Land scarcity has also been shown to be a key driver of migration in Uganda and Nepal. Too unfavorable environmental conditions can also hinder migration; the cost of migration is deemed as an investment, and severe resource constraints may limit the ability to make the investment. Remittances may have beneficial impact on local environment if the purchased goods are substituted for the goods extracted from local environment. They may also be invested in resource conservation. Farming system is one mediating variable that influences the impact of migration on environmental outcome. Impacts may be less significant in cattle-raising systems where labor demands are small. If there is no functioning credit and insurance market, migration may be adopted as a strategy to mitigate risk. On the other hand, VanWey (2005) finds that both a lack of land and a large amount of land can motivate migration in Thailand and Mexico. VanWey (2005) suggests that individuals from households with large landholdings migrate to access capital for investments in technology and other agricultural inputs. As countries complete the demographic transition with lower fertility and mortality, migration becomes an increasingly dominant force in demographic change. The traditional focus has been at higher levels (aggregate or macro/meso levels) and only on the impact on the destination environment. But more recent work has focused on the “push” created by environmental degradation at the origin and impacts on the origin itself. Mobility and Population Pressure at the Macro and Meso Levels. The wellcited work of Wilbur Zelinsky builds on the underlying theory of human migration described by Ravenstein, Thomas, Stouffer, and Lee (see Zelinsky 1971 and references therein). Zelinsky (1971) explores the temporal and spatial dimensions of transition in human mobility hypothesized to happen with modernization at macro/ meso levels. Zelinsky’s mobility transition model hypothesizes five stages in the transition. Hugo (2011) provides a useful review, criticizing models like Zelinsky’s for failure to consider the two-way migration that is often observed, by focusing on net migration. Spatial differentials across regions likely contribute to what Hugo (2011) described as the “increase in the scale and complexity of both internal and international movement over time” (Hugo 2011, p. S23). He observes that the middle stages of the demographic transition tend to correlate with international migration and with

1590

J. L. Findeis and S. Pervez

rural-to-urban migration. Further, in these stages – characteristic of many of the world’s developing regions today – young adults tend to be found in higher concentrations in the population and typically have a higher likelihood of migrating compared to older age cohorts (see Hugo 2011 for discussion of other demographic effects). From an environmental perspective, out-migration to more resource-rich locations with lower population densities may reduce population pressure (and environmental stress) at the origin. However, the regional disruptions that can occur as a result of out-migration may mean disruption of local social norms which keep in check bad environmental behaviors. On the other hand, out-migration accompanied by return migration can bring in new ideas and stimulate innovation, including new environmentally sustainable technologies. The higher degree of temporary mobility that is documented – for example, “international shuttling” behaviors between Mexico and the USA and also the growing density of commuting networks in many regions (Goetz et al. 2010) – may work as a diffusion of innovation generator. Also, the interaction between new (e.g., mobile phone) technologies that generate product demand (to be “like the Joneses” in the more-developed world) and the truly massive flow of international migration remittances to the origin may stimulate consumption at the origin. What is clear from a review of the literature is that population growth, environment, and migration are likely interrelated but deserve much more attention. Rural-Urban Adjustments. Finally, no review of the population-environment nexus would be complete without mention of the growing prominence of urban places across the globe. Urban places have become particularly attractive population magnets; the global urban population has now surpassed the global population defined as rural. This reflects domestic and international migration fueled by GDP concentrations in city centers. The trend appears to be worldwide; even in South and Southeast Asia, Africa, and Latin America, humans are rapidly concentrating into urban centers (PRB 2010). Long-term trends show that the US population but also per capita GDP have become even more concentrated in metro areas, an international trend shared by the OECD countries and most countries worldwide. Rapid urbanization “hinders the development of adequate infrastructure and regulatory mechanisms for coping with pollution and other byproducts of growth, often resulting in high levels of air and water pollution and other environmental ills” (Hunter 2000, p. xiii). Urbanization can also alter “local climate patterns” (Hunter 2000), with concentrations of artificial surfaces creating heat islands (Hunter 2000). Another common result is sprawl. Unless checked by geography or policy, population eventually spreads over the surrounding landscape. In some cases, consumption drives the transition. Greater mobility of the human population also contributes to the spillover onto nearby landscapes, for consumption of natural resource amenities, for perceived healthier lifestyles and safety, but also for lower costs of living. Environmental amenities in the countryside are recognized as strong attractants (see Cherry and Rickman 2011). Population growth and greater mobility, higher consumption levels in some places, and the age distribution of the population are expected to influence patterns

Population and the Environment

1591

of rural-exurban-urban transformation and rates of transition of the landscape and ecosystem services. Important related issues include the following: food production for growing urban populations, provision of specific ecosystem services, and use of natural resources including land proximate to population centers. Food availability and regional and local food production capacity represent major concerns, with access to food and water in the less- and least-developed regions being paramount. Finally, while countries are at different stages in this process, the process itself appears to be essentially the same (Findeis et al. 2009). That is, the populationdriven changes taking place across exurban or peri-urban landscapes are in some respects very similar worldwide.

5

Conclusions

The total population living on earth continues to grow rapidly, putting significant pressure on the earth’s ecosystem. While the rate of growth has recently declined, absolute numbers of people sharing the earth’s space continue to grow. Technological change (e.g., sustainable agricultural technologies), input substitution (e.g., substitution of biopower for fossil fuels), and the emergence of new institutions will contribute to reduce the environment problem in the developed countries, although impacts on the environment stemming from affluence and high levels of consumption will remain issues. Further, transboundary externalities – the shifting of the costs of economic development from the “haves” to the “have nots” but also among the “haves” – will continue to take center stage in environmental discourse. Well-known issues include climate change, loss of biodiversity, and water quality and quantity, among other well-publicized environmental issues. The less- and least-developed regions of the world will face particularly stiff challenges. Population growth rates are projected to be highest in these regions. As argued in this chapter, some of what are considered the most vulnerable places on earth are inhabited by populations with high population growth rates and lower chances of developing economically for reasons largely stemming from their environment and geography. The major challenge will be to be resilient in the face of the dual challenges of own growth and the force of transboundary externalities created by others. Innovation should help to reduce impacts, but whether innovation will be targeted to sustainable solutions adaptable to these regions is a critical question. For regional science, the challenge will be greater focus on three issues: (i) the interface between the earth’s more-developed and less-/least-developed regions, and how to reduce environmental impacts stemming from the developed world; (ii) development within the less-/least-developed regions of the globe to reduce environmental impact as growth occurs; and (iii) the three-way interaction between population, environment, and migration. Population change was identified by the Social, Behavioral and Economic Sciences Directorate of the National Science Foundation as one of four major topic areas for future research (NSF 2011). “Sources of disparities” is the second of the four areas identified for major emphasis. Regional scientists can provide insight into both major challenges but especially deepen our

1592

J. L. Findeis and S. Pervez

understanding of how to effectively reduce differentials between affluent and poor regions of the world in this century, a challenge of paramount importance. As shown in this chapter, the incentives, thresholds, and nonlinear relationships surrounding population change and sources of disparities are highly complex. In many respects, we are only in the early stages of understanding the complex underlying mechanisms and relationships behind a full understanding of the population-environment nexus. Evolution of the interrelationships among regions, a purview of regional science, is known for its complexity. Also complex is how to balance – over a very short period of time – population growth and food, water, and other resource requirements by humans in different regions of the world. Hunter (2000) argues that we need a more “precise scientific understanding of the complex interactions between demographic processes and the environment” (Hunter 2000, p. xx in report’s foreword). Scientists, including regional scientists, will need to collect robust data and develop new models that link natural and social-economicbehavioral processes and build a compelling library of credible evidence from across the globe to inform decision-making and governance at multiple scales. A major challenge will be coordinating this work across the multiple scales and with multiple disciplines and publics to avoid what could be catastrophic impacts. However, there is really no choice.

6

Cross-References

▶ Climate Change and Regional Impacts ▶ Relational and Evolutionary Economic Geography ▶ Schools of Thought on Economic Geography, Institutions, and Development ▶ Urban and Regional Sustainability

References Baland JM, Platteau JP (1996) Halting degradation of natural resources: is there a role for rural communities? Clarendon, Oxford Birdsall N (1988) Chapter 12: Economic approaches to population growth. In: Chenery H, Srinivasan TN (eds) Handbook of development economics, vol 1. Elsevier, Amsterdam, pp 477–542 Campbell M (2007) Why the silence on population? Popul Environ 28(4):237–246 Carson, R (1962) Silent Spring. Houghton Mifflin, Boston Carson RT (2010) The environmental Kuznets curve: seeking empirical regularity and theoretical structure. Rev Environ Econ Policy 4(1):3–23 Cherry T, Rickman D (2011) Environmental amenities and regional economic development. Routledge, London Coale AJ, Hoover EM (1958) Population growth and economic development in low-income countries: a case study of India’s prospects. Princeton University Press, Princeton Dasgupta P (2003) Chapter 5: Population, poverty, and the natural environment. In: Maler KG, Vincent JR (eds) Handbook of environmental economics, vol 1, 1st edn. Elsevier, Amsterdam, pp 191–247

Population and the Environment

1593

de Sherbinin A, VanWey LK, McSweeney K, Aggarwal R, Barbieri A, Henry S, Hunter LM, Twine W, Walker R (2008) Rural household demographics, livelihoods and the environment. Glob Environ Chang 18(1):38–53 Edwards R (2011) Changes in world inequality in length of life: 1970–2000. Popul Dev Rev 37 (3):499–528 Ehrlich PR, Holdren JP (1971) Impact of population growth. Science 171(3977):1212–1217 Findeis J, Brasier K, Salcedo Du Bois R (2009) Chapter 2: Demographic change and land use transitions. In: Goetz S, Brouwer F (eds) New perspectives on agri-environmental policies: a multidisciplinary and transatlantic approach. Routledge, Abingdon, pp 13–40 Gallup JL, Sachs JD, Mellinger AD (1999) Geography and economic development. Int Reg Sci Rev 22(2):179–232 Galor O (2005) Chapter 4: From stagnation to growth: unified growth theory. In: Aghion P, Durlauf S (eds) Handbook of economic growth, vol 1, 1st edn. Elsevier, Amsterdam, pp 171–293 Goetz S, Han Y, Findeis J, Brasier K (2010) US commuting networks and economic growth: measurement and implications for spatial policy. Growth Chang 41(2):276–302 Grossman GM, Krueger AB (1995) Economic growth and the environment. Q J Econ 110(2):353–377 Henry S, Schoumaker B, Beauchemin C (2004) The impact of rainfall on the first out-migration: a multi-level event-history analysis in Burkina Faso. Popul Environ 25(5):423–460 Hugo G (2011) Future demographic change and its interactions with migration and climate change. Glob Environ Chang 215(Suppl 1):S21–S33 Hunter LM (2000) The environmental implications of population dynamics. RAND, Santa Monica, 33 pp Malthus, TR (1798) An Essay on the Principle of Population, as lt Affects the Future lmprovement of Society, with Remarks on the Speculations of Mr. Godwin, M. Condorcet, and Other Writers. 1st edition. J. Johnson, London McNeill JR (2006) Population and the natural environment: trends and challenges. Popul Dev Rev 32(Suppl):183–202 Nordhaus WD (1973) World dynamics: measurement without data. Econ J 83:1156–1183 NSF (National Science Foundation) (2011) Rebuilding the mosaic: fostering research in the social, behavioral, and economic sciences at the National Science Foundation in the next decade. NSF 11-086, Arlington 63 pp PRB (Population Reference Bureau) (2010) 2010 World population data sheet. http://www.prb.org/ pdf10/10wpds_eng.pdf. Accessed 30 Sept 2018 PRB (Population Reference Bureau) (2018) 2018 World population data sheet with a special focus on changing age structures. 19pp Scherbov S, Lutz W, Sanderson WC (2011) The uncertain timing of reaching 8 billion, peak world population, and other demographic milestones. Popul Dev Rev 37(3):571–578 UNFPA (United Nations Population Fund) (2018) The power of choice: reproductive rights and the demographic transition, UNFPA State of the World Population 2018. United Nations Population Fund, New York, 156 pp United Nations, Department of Economic and Social Affairs, Population Division (2015). World Fertility Patterns 2015 - Data Booklet (ST/ESA/SER.A/3701) VanWey LK (2005) Land ownership as a determinant of international and internal migration in Mexico and internal migration in Thailand. Int Migr Rev 39(1):141–172 Vitousek PM, Mooney HA, Lubchenco J, Melillo JM (1997) Human domination of Earth’s ecosystems. Science 277(5325):494–499 Zelinsky W (1971) The hypothesis of the mobility transition. Geogr Rev 61(2):219–249

Section IX Spatial Analysis and Geocomputation

Geographic Information Science Michael F. Goodchild and Paul A. Longley

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Principles of GIScience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 The Characteristics of Geographic Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Dealing with Large Data Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Scale-Related Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Simulation in GIScience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Achievements of GIScience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Changing Practice and Changing Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 CyberGIS and Parallel Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The Social Context of GISystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 GIScience and Data Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Neogeography, Wikification, Open Data, and Consumer Data . . . . . . . . . . . . . . . . . . . . . 4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1598 1600 1600 1602 1603 1604 1605 1607 1607 1609 1609 1611 1612 1613

Abstract

This chapter begins with a definition of geographic information science (GIScience). We then discuss how this research area has been influenced by recent developments in computing and data-intensive analysis, before setting out its core organizing principles from a practical perspective. The following section reflects on the key characteristics of geographic information, the problems posed by large data volumes, the relevance of geographic scale, the remit M. F. Goodchild (*) Center for Spatial Studies and Department of Geography, University of California, Santa Barbara, CA, USA e-mail: [email protected] P. A. Longley Consumer Data Research Centre and Department of Geography, University College London, London, UK e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_61

1597

1598

M. F. Goodchild and P. A. Longley

of geographic simulation, and the key achievements of GIScience to date. Our subsequent review of changing scientific practices and the changing problems facing scientists addresses developments in high-performance computing, heightened awareness of the social context of geographic information systems (GISystems), and the importance of neogeography in providing new data sources, in driving the need for new techniques, and in heightening a human-centric perspective. Keywords

GIScience · GISystem · Spatial analysis · Fourth paradigm · Big data

1

Introduction

Geographic information science (GIScience) addresses fundamental issues associated with geographic information and the use of geographic information systems to perform spatial analysis, using a scientific approach (for detailed discussions of the nature of geographic information science, see Duckham et al. 2003). The issues may be practical, as in the question of how to address uncertainty in geographic information; they may be empirical, as in the observation generally known as Tobler’s First Law of Geography that nearby things are more similar than distant things (Tobler 1970); or they may be theoretical, as in the fundamental contribution known as the 9-intersection of topology that formally classifies all topologically distinct relations between two objects (Egenhofer and Franzosa 1991). To some, the term implies the use of a GISystem as a scientific tool in research and decision-making, and as such it has been widely applied to the solution of virtually any problem that is embedded in geographic space, from global warming to crime and water pollution. Much progress has been made in GIScience in the two decades since the term was coined (Goodchild 1992), through the efforts of a growing scientific community. It is also important to note that GIScience plays an important role in the practice of regional science, as both a technology that can support research, and as an approach to problem-solving. Many other terms overlap to some degree with GIScience. For example, the term geocomputation is also fundamentally concerned with geographic information, in other words information about features and phenomena and their locations on or near the Earth’s surface. Coined a little later by Openshaw and Abrahart (1996), the term is often used in cross-sectional analysis to describe the repeated analysis and simulation of spatial distributions, in order to explore spatial distributions and to draw inferences about the processes that created and govern them. More specifically, the term is often taken to imply simulation of processes operating in the geographic domain and thus with geographic information that captures evidence of those processes and is thus primarily dynamic. The major issues in geocomputation often center on the computational problems that arise in simulating complex systems with massive numbers of features, data items, or agents. In this sense

Geographic Information Science

1599

geocomputation develops an application-led focus upon the way the world works, founded upon rich digital representations of the way that the world looks, and makes prediction a central goal. The main contribution of geocomputation may thus lie in the development of better tools for dealing with complex, dynamic systems and for predicting their future states. The remainder of this section elaborates on the basic definition of GIScience and the research conducted under its banner. This is followed by a discussion of the basic principles of GIScience, with an emphasis on those areas where GIScience has been successful at solving computationally intensive problems. Major methods of spatial analysis (or analytics) using GISystems are also reviewed. The third section of the chapter addresses changing practices in GIScience, focusing on the increasing importance of collaboration, on novel and diverse data sources, on the availability of massive computational resources, and on the problems of dealing with uncertainty. Science generally is changing in response to the need to study complex systems and the use of simulation, and this trend is certainly affecting GIScience. The concept of data-intensive science, Fourth Paradigm (Hey et al. 2009), has a natural fit to geographic problems and their massive volumes of data, while the meta-issues of generalization, documentation, and provenance are beginning to loom large in a science that is no longer dominated by the individual investigator. The very rapid growth of the discipline of data science in recent years is also challenging many of the traditional approaches to science, and its relationship to GIScience is addressed in this third section. Finally, the fourth major section speculates on the future and discusses the continuing evolution of GIScience. Future developments are likely to be driven, as in the past, by trends in data, in computation, and in the society that forms the context for both fields. While debates about the nature and meaning of science have raged for centuries and will probably never end, the core ideas are clear. First, science seeks laws and principles that can be shown to be valid in the observable world and are generalizable in the sense that they apply everywhere and at all times. Both of the examples cited earlier – Tobler’s First Law and the 9-intersection – are clearly of this nature, and as a theoretical conclusion, the 9-intersection not only applies everywhere at all times but also applies in any imaginable space. Second, science is founded on definitions of terms that are rigorously stated and understood by all scientists. Third, scientific experiments and their results are replicable, being stated in sufficient detail that someone else could expect to obtain them by carrying out an identical experiment. In this context the term black box is pejorative, since procedures that are hidden inside a box cannot be described and therefore cannot be replicated. Well-understood principles also apply to the details of reporting, as in the rule that any measurement or numerical result be stated to a precision (number of significant digits) that reflects the accuracy of the measuring device or model. Principles such as these help to define GIScience and to distinguish it from less rigorous applications of GISystems and related technologies. A distinction is often drawn between pure science, or science for the sake of curiosity and the quest for general discoveries, and applied science, or science that

1600

M. F. Goodchild and P. A. Longley

aims to solve problems in the observable world using scientific methods. The geoprefix reminds us that the Earth provides a unique laboratory for scientific investigation, and the uniqueness of the places on it often limits the scope for the kinds of controlled experiments that characterize scientific activity in other disciplines. Geographic space is the space of human activity, and most of the problems facing human society are embedded in it, from poverty and hunger to health. Indeed, it is hard sometimes to avoid application in GIScience because the field is inevitably close to the real world, a fact that perhaps accounts for at least some of the passion displayed by its practitioners. Moreover curiosity has often provided the motivation to explore, characterize, and map the geographic world, though the results of such exploration are rarely generalizable in the sense that Newton’s Laws of Motion or the Mendeleev periodic table are generalizable. This pure/applied distinction explains how progress in spatial analysis is measured. On the one hand, the refereed journals in which much successful GIScience research is published, and the presentations at conferences such as the biennial International Symposia on Geographic Information Science, emphasize the purer forms of science, while the emphasis at other conferences, such as the biennial International Conferences on Geocomputation, emphasizes how the core organizing principles and concepts of GIScience can be brought to bear on solving practical problems. A large industry, valued according to some estimates at $20 billion annually (Longley et al. 2015), has sprung up around the data acquisitions and tools needed in such practical problem-solving. Clearly the metrics of success here are much more diverse than in pure science.

2

Principles of GIScience

In this section we describe some of the major achievements of GIScience in its first two decades. We begin with a discussion of the characteristics that distinguish geographic information and geographic problem-solving from data-driven science in other domains. We then discuss the strategies that have been adopted in GIScience for avoiding or successfully dealing with the problems of large data volumes, including aggregation, divide-and-conquer, and compression. We discuss some of the unintended consequences of such strategies, in the form of uncertainty, the ecological fallacy, and the modifiable areal unit problem. We elaborate on the nature of simulation in geographic space, on some of the more successful research conducted in this area, and on some of the issues it raises. Finally, we present a brief summary of progress in GIScience in the past 20 years.

2.1

The Characteristics of Geographic Information

One of the first attempts to identify the special characteristics of geographic information, or “What is special about spatial?,” was made by Anselin (1989). He argued that two characteristics were universal: spatial dependence and spatial heterogeneity.

Geographic Information Science

1601

Reference has already been made to the first, in the form of Tobler’s First Law of Geography: “All things are similar, but nearby things are more similar than distant things.” The statement may appear informal and vague, but it is readily formalized in the principles of regionalized variables that underlie the science of geostatistics and in the models widely used in spatial statistics. While we can argue about whether the statement meets the criteria for a law as that term is normally understood by philosophers of science, and whether exceptions should be allowed, it is clear that the vast majority of phenomena distributed over the Earth’s surface and near-surface adhere to it, while differing in precisely how similarity decays with distance. Moreover there is no doubt of the law’s efficacy in the design and implementation of GISystems. The principle is essentially one of context, since it requires a phenomenon at one point to be consistent with the same phenomenon at nearby points. It appears to apply well in three-dimensional space and also to apply in four-dimensional spacetime. Perhaps the easiest way to demonstrate its validity is by a thought experiment in which it is not true, where a minute displacement on the Earth’s surface produces a completely independent environment – clearly this does not happen and cannot happen, though there are many examples where a displacement of at least a finite amount produces an apparently independent environment (such a minimal displacement is known in geostatistics as the range of the phenomenon, and synonyms exist in many domains of science). As a cornerstone of GIScience, the principle has two major implications. First, similarity over short distances allows the Earth’s surface to be divided into regions within which phenomena are approximately homogeneous, achieving great economies in data volume by expressing attributes as properties of entire areas rather than of individual points. In short, the principle enables the assumed-homogeneous polygons that dominate many representations in GISystems. Similarly, it allows reasonable guesses to be made of the properties of places that have not been visited or measured, in a process known as spatial interpolation. The principle thus justifies the techniques that are used, for example, to create weather maps from scattered point observations. Unfortunately the principle of spatial dependence also provides a major headache for researchers working with geographic information, since it runs counter to the assumption made in many statistical tests that the data were acquired through a process of random and independent sampling from a parent population. An analysis of the 58 counties of California, for example, cannot make that assumption since the principle implies that conditions in neighboring counties will be similar. Moreover there is no larger universe of which the set of all counties of California constitute a random sample. Anselin’s second principle addresses spatial heterogeneity, or the tendency for parts of the Earth’s surface to be distinct from one another. This also has profound implications. Consider, for example, a local agency seeking to define a taxonomy of local land use. The result will inevitably be different depending on the agency’s location and the local conditions in its jurisdiction, and every jurisdiction will argue that its scheme is better than any global or national standard. In early geodesy, the

1602

M. F. Goodchild and P. A. Longley

figure of the Earth (the mathematical function used to approximate the Earth’s shape and thus define latitude and longitude) was unique to each jurisdiction or region, and it was not until the 1960s that pressure for a single standard prevailed, driven by the growing importance of air travel and the targeting of intercontinental ballistic missiles. Unfortunately any universal standard will inevitably be suboptimal for any local jurisdiction, whether it be over land-use classification or the shape of the Earth, so there will always be tension between the desire to be locally optimal and the desire to be globally universal.

2.2

Dealing with Large Data Volumes

The previous section was concerned with principles that can be demonstrated to be empirically true. We now move to a discussion of some of the principles that guide the design of GISystems and allow these systems to deal with problems that might otherwise be overwhelmingly voluminous, a key issue given the goal of addressing large problems. The Earth’s surface has approximately 500 million km2, and a description of it at a resolution of 1 m2 would therefore create 500 trillion data elements if no strategy were adopted to reduce the volume. Even allocating a single byte to each data element would create half a petabyte of data. In the previous section, we discussed Tobler’s First Law, the basis for aggregating data elements into statements about entire polygons that have been delimited on a cartographically projected Earth. California’s land area amounts to 403,800 km2, and describing each square meter with a two-byte abbreviation CA would produce roughly 0.8 terabytes of data. But capturing the coordinates of its boundary and adding a single attribute CA to the polygon could clearly compress this to only a few kilobytes, even with precise coordinates, and, by recording only a single attribute, would avoid the potential for error in the vast number of identical attributes that would have to be recorded in a cell-by-cell approach. Alternatively, a variety of compression techniques can also be used to replace cells of individual data elements with a series of pairs. Many other methods of compression, generalization, and abstraction have been devised to deal with the volume problem, some of them lossy in the sense that the result is only approximately identical and the original data cannot be recovered from the compressed version and some of them lossless. In a divide-and-conquer strategy a geographic area is partitioned, and analysis or modeling proceeds one partition at a time. The term tile is often used for partition, especially where the partitions are rectangular (on a cartographically projected Earth). Instead of solving a problem for the whole of California, for example, one might solve it separately for each of its counties. Interactions exist between counties in almost every application: in analyzing water pollution, for example, the actions of a county will influence the water quality in any downstream county, and air pollution will travel to any counties that lie downwind. Thus a successful divide-and-conquer strategy must also consider the degree to which counties interact, and include this in the model, often by iterating between modeling within-county effects and modeling

Geographic Information Science

1603

between-county effects. Nevertheless the overall computational efficiency of the modeling will probably be improved by adopting this strategy. Many algorithms used in GISystems make explicit use of divide-and-conquer, as an approach to handling the vast amounts of data provided by satellite-based remote sensing, and implicit divide-and-conquer has been an intrinsic part of human problem-solving from time immemorial. Despite these traditional strategies, we note in the next major section that massive improvements in computing capacity, along with the increasing availability of fine-resolution data on both environmental and social phenomena, is opening a host of new possibilities. It is increasingly possible to avoid the constraints of divide-and-conquer and to study processes at previously unheard-of resolution.

2.3

Scale-Related Issues

The term scale is often used in GIScience in the sense of spatial resolution, to distinguish between fine-scale or detailed data and coarse-scale or generalized data. Some of the techniques described in the previous section essentially sacrifice scale in the interests of reducing data volume. To a cartographer, reducing a map’s representative fraction, the ratio of distance on the map to distance on the Earth, is similarly a sacrifice of scale, often in the interests of visual clarity. To a compiler of social statistics, reporting counts of people based on large, aggregated reporting zones may also be a means of reducing data volume. All of these techniques have consequences that are well recognized in GIScience. The modifiable areal unit problem refers to the effects that changes in reporting zone boundaries will have on the results of any geographic analysis. The term was first formally characterized by Openshaw (1983), who demonstrated that changing reporting zone boundaries could produce dramatic swings in results, even when holding scale constant. His solution, which became a fundamental tenet of GIScience, was to explore the aggregation effect in any specific case, by repeated analysis using different zones. Unfortunately in most cases, this can only be done by aggregating predefined zones, producing different results but at a still coarser level of aggregation, since data compiled for different zonal arrangements at the same level of aggregation will usually not be available. Many studies have documented the problem, while others have argued that it results not from a failure of analytic method but from a failure on the part of the investigator to be explicit about the scale at which the hypothesized effects occur. For example, in Openshaw’s original case study, the 99 counties of Iowa were used to explore the relationship between percent of the population over 65 and percent registered Republican voters. Aggregating the counties in various ways did indeed produce different results but at coarser scale. What is missing in this case is a well-defined hypothesis as to why this correlation should appear and at what scale. Perhaps the process works at the individual level, and older people are more likely to vote Republican, in which case the hypothesis is best tested at the individual level. Or perhaps the process is ecological: a neighborhood with a large percent of people over 65 also attracts

1604

M. F. Goodchild and P. A. Longley

a large percent of Republican voters, whether or not they are over 65. In the latter case, the appropriate scale of analysis is that of the neighborhood, requiring a formal definition of that concept and an aggregation of finer-scale data, such as block-group data, to the neighborhood level. The general point is that we should not be looking for statistics that are invariant to the phenomenon that we wish to study. As such, the MAUP is not an empirical problem but rather is a theoretical requirement to hone statistics to the geographic context in which they are applied. A closely related problem, also well-recognized in GIScience, is the ecological fallacy, the fallacy of reasoning from the aggregate to the individual. The fallacy already appeared in the previous paragraph, since it would be wrong to infer from a county-level correlation that individuals over 65 tend to vote Republican – in fact, in the extreme, Openshaw’s correlations could exist in Iowa at the county level even though no person over 65 was a registered Republican.

2.4

Simulation in GIScience

Many processes that operate on the Earth’s surface can be abstracted in the form of simple rules. One might hypothesize, for example, that consumers always purchase groceries from the store that can be reached in minimum time from their homes. Exactly how such hypotheses play out in the real world can be difficult to predict, however, because of the basic heterogeneity and complexity of the Earth’s surface. Christaller was able to show that such simple assumptions about behavior led to simple patterns of settlements in areas dominated by agriculture, but only by assuming a perfectly uniform plane. Similarly, Davis was able to theorize about the development of topography through the process of erosion, but only by assuming a starting condition of a flat, uplifted block. Research in both areas has clearly demonstrated that the perfect theoretical patterns predicted never arise in practice. One strategy for addressing such issues is to assume that in the infinite complexity of the real world, all patterns are equally likely to emerge and that the properties we will observe will be those that are most likely. This strategy enabled Wilson (1970) to show that the most likely form of distance decay in human interaction was the negative exponential; and Shreve (1966) was able to show that the effect of random development of stream networks would be the laws previously observed by Horton. Similar approaches have been applied to the statistical distribution of city size, or the patterning of urban form (Batty and Longley 1994). Nevertheless, while they yield results that are often strikingly in agreement with reality, such approaches lack the practical value that real-world decision-making demands. Instead, GIScience is increasingly being used to simulate the effects of simple hypotheses about behavior on the complex landscapes presented by the geographic world. The generality of such approaches lies in the hypotheses they make about behavior; the landscapes they address, and the patterns they produce, are essentially unique. Such approaches fall into two major categories, depending on how the hypotheses about behavior are expressed. The approach of cellular automata begins with a representation of the landscape as a raster and implements a set of rules about

Geographic Information Science

1605

the conditions in any cell of the raster. The approach was originally popularized by Conway in his Game of Life, in which he was able to show that distinct patterns emerged through the playing out of simple rules on a uniform landscape. Such patterns are known as emergent properties, since they would be virtually impossible to predict through mathematical analysis. The cellular-automata approach has been used by Clarke (e.g., Clarke and Gaydos 1998) and others to simulate urban growth, based on simple rules that govern whether or not a cell will change state from undeveloped to developed. Such approaches allow for the testing of policy options, expressed in the form of modifications to the rules or to the landscape, and have been widely adopted by urban planners. The alternative approach centers on the concept of agent, an entity that is able to move across the geographic landscape and behave according to specified rules. This agent-based approach is thus somewhat distinct from the cell-based approach of cellular automata. Agent-based models have been widely implemented in GIScience. For example, Torrens et al. (2011) have studied the behavior of crowds using simple rules of individual behavior, with applications in the management of large crowds with their potential for panic and mass injury. Evans and Kelley (2004) have studied the behavior of decision-makers in their role in the evolution of rural landscapes and examined policies that may lead to less fragmentation of land cover, and thus greater sustainability of wildlife. Both approaches raise a number of issues (for a general discussion of these issues, see, e.g., Parker et al. 2003). From an epistemological perspective, several authors have explored the role of such modeling efforts in advancing scientific knowledge. On the one hand, a model is only as good as the rules and hypotheses about behavior on which it is based. It is unlikely that the results of simulation will lead directly to a modification of the rules and more likely that rules will be improved through controlled experiments outside the context of the modeling. If patterns emerge that were unexpected, one might argue that scientific knowledge has advanced, but on the other hand, such patterns may be due to the specific details of the modeling and may not replicate anything that actually happens in the real world. Validation and verification of simulation models are always problematic, since the results purport to represent a future that is still to come. Hindcasting is a useful technique, in which the model is used to predict what is already part of the historic record, usually by working forward from some time in the past. But the predictions of the model will never replicate reality perfectly, forcing the investigator to ask what level of error in prediction is acceptable and what is unacceptable. Moreover it is possible and indeed likely that rules and hypotheses about social behavior that drive the model will change in the future. In that regard models of physical processes may be more reliable than models of social processes.

2.5

Achievements of GIScience

As we noted earlier, the term GIScience was coined in a 1992 paper (Goodchild 1992). In some ways the paper was a reaction to comments being made in the literature about the significance of GISystems: that they were little more than tools

1606

M. F. Goodchild and P. A. Longley

and did not therefore deserve a place in the academy. The funding of the US National Center for Geographic Information and Analysis (NCGIA) in 1988 by the National Science Foundation seemed to indicate a willingness in some quarters to see more in GISystems than technique. Nevertheless the tool/science debate continued for some time and is summarized by Wright et al. (1997). While any level of consensus is inevitably difficult to achieve, the following might be argued to be the major achievements of two-and-a-half decades of GIScience: • Clarification and specification of the basic data model, including recognition of the fundamental significance of discrete-object and continuous-field conceptualizations, the emergence of object-oriented data modeling, and the specification of spatial relations. • The development of place-based techniques of spatial analysis, including local indicators of spatial association (Anselin 1995; Ord and Getis 1995), spatial regression models (LeSage and Pace 2009), and geographically weighted regression (Fotheringham et al. 2002). • The specification of standards for simple features, metadata, real-time interaction across the Internet, and many other aspects of GISystem practice, led by the Open Geospatial Consortium and the US Federal Geographic Data Committee. • The development of digital globes such as Google Earth that allow real-time interaction with three-dimensional models of the Earth. • Recognition of the importance of ontology, as the key to interoperability across communities, languages, and cultures. • Search and retrieval based on geographic location, through mechanisms such as the geoportal (Maguire and Longley 2005). • Advances in geovisualization, going far beyond the capabilities of conventional cartography to include animation, the third spatial dimension, reduction of highdimensional data sets, and many other topics. • Achievement of a new level of understanding of uncertainty in geographic information, its handling, and its effects, together with a fundamental shift of focus from accuracy to uncertainty. • Focus on the source and operation of bias in geographic representation, particularly where Big Data sources are repurposed for research applications that were not anticipated or intended when data were created (Longley et al. 2018). This can be seen as retaining focus upon data collection methods and their suitability for spatial analysis in the Big Data era. Perhaps more important are the institutional achievements, which can be seen as the indirect result of such advances. GIScience is now widely recognized in the titles of journals and the names of departments and programs. In recent years several GIScientists have been elected to prestigious institutions such as the US National Academy of Sciences and the UK’s Royal Society. GIScience conferences have proliferated, and the GIScience bookshelf now contains an impressive array of titles.

Geographic Information Science

3

1607

Changing Practice and Changing Problems

In this section we examine the changing nature of GIScience and speculate on its future. GISystems have always been driven by competing factors. On the one hand, they are at the mercy of trends and changes within the larger computing industry, including new technologies that may or may not offer significant benefits. For example, the relational database management systems of the 1970s led to a major breakthrough in data modeling in GISystems. These systems have also been driven by the need to solve problems of importance to society, from the resource management that provided the initial applications in the 1980s to the military applications that have always been important but half-hidden and new applications in public health that are as yet only partially developed. GISystems as tools for science are subject to the winds of change that are currently blowing through the scientific community, pushing it toward a more collaborative, multidisciplinary, and datacentric paradigm. Finally, GISystems exist in a social context of concerns about privacy, about scientific practices, and about the role that sometimes expensive technology can play in empowering the already empowered and are being influenced by the importance of the average citizen as both a consumer and producer of geographic information. This section is structured as follows. We begin with a discussion of highperformance computing and its importance for the kinds of massive simulation models discussed previously. We then move to a discussion of the social context of GISystems and the social critique that emerged in the 1990s and now drives the research of many GIScientists. This is followed by a discussion of the relationship between GIScience and data science. Finally, we examine the phenomenon of neogeography and the importance it may hold in providing new data sources and in driving the need for new techniques.

3.1

CyberGIS and Parallel Processing

A major report of the US National Science Foundation (NSF 2003) proposed the term cyberinfrastructure to describe the kinds of computing infrastructure that would be needed to support science in the future. Instead of the lone investigator and the desktop system, the report envisioned a distributed infrastructure that would support widespread collaboration across a range of disciplines, following the notion that science in the future would address complex problems with complementary teams of scientists of varied expertise. The solution of complex, large-scale problems would also require a heavy level of investment in high-performance computing (HPC) with its massively parallel architectures. Parallel architectures have an inherently good fit to the nature of geographic space and its somewhat independent individual and community agents, all of which can be seen as semi-independent decision-makers acting in parallel rather than serially. A number of authors have argued that geographic research and problem-solving requires a specific form of cyberinfrastructure that addresses several key issues and

1608

M. F. Goodchild and P. A. Longley

have coined the term cyberGIS (Wang 2016). How exactly should the geographic world be partitioned across processors? How should one measure computational intensity as a geographic variable? How should the user interface of an integrated cyberGIS be designed? What types of problems, models, and analyses best justify these new approaches? What incentives will persuade the average GIScientist to engage with cyberGIS, given the initial impression of complexity and inaccessibility, and a high level of personal investment in conventional GISystems? Efforts to parallelize GISystem architectures date from the 1990s but were not successful for several reasons. First, parallel computing was expensive at the time, and it was difficult for investigators to justify the cost. Second, parallel computing was rendered inaccessible by the need to reprogram in specialized languages. Third, while it was easy to find examples of geographic problems that involved massive volumes of data, it was harder to find ones that involved massive computation. Finally, collaborative technologies had not yet advanced to the point where it was possible for widely distributed research teams to work together productively. Many of these arguments are now moot, however. HPC is widely available, and Cloud and Grid technologies are making the transition from conventional computing almost transparent. The need for collaboration is much stronger, and the kinds of problems that used to be solved by individual investigators are no longer common. Finally, the doors have been opened to the kinds of massive computation that HPC is designed to address. Indeed, the most compelling examples of the need for HPC lie in the kinds of agent-based and cellular simulations reviewed in the previous section. In recent years it has also become possible to parallelize processing on the desktop, following the addition of graphical processing units (GPUs) to graphics boards in order to improve the quality and speed of image rendering. Although an innovation of the computer games market, GPU chips were subsequently adapted to more general-purpose computing: today, Nvidia (which, along with AMD, is the world’s largest graphics card manufacturer) produces chips designed specifically for non-graphics applications and provides a specialized programminglanguage architecture for use with them. GPUs outperform traditional computation on a central processing unit (CPU) because a GPU has a higher density of cores and uses a process called streaming to handle a number of operations simultaneously. The result is an increased processing speed of computationally intensive algorithms. General-purpose computing on graphics processing units (GPGPU) describes the exploitation of the resources of the GPU for various tasks which might previously have been conducted on a CPU. It has particular advantages for real-time systems where the speed of return of results is fundamental to usability and interaction. Adnan et al. (2014) describe an application in computational geodemographics, in which k-means (a frequently used algorithm in the creation of geodemographic classifications) is enhanced to run in parallel over a GPU. This work exploits the parallel-computing Computer Unified Device Architecture (CUDA), which allows code written in standard C or C++ to be used in GPU processing.

Geographic Information Science

3.2

1609

The Social Context of GISystems

Although the technology that underpins GIScience is an established part of the information technology (IT) mainstream, there is enduring unease in some academic quarters about the social implications of this technology. Early statements were contained in Pickles’ (1993) edited volume Ground Truth: the Social Implications of Geographic Information Systems, which remains an enduring statement of concerns built around four principal issues. First, there is the view that GISystem technology is used to portray homogeneity rather than representing the needs and views of minorities and that this arises in part because systems are created and maintained by vested interests in society. The roots to this critique can be traced to a wider debate as to whether the umbrella term GISystem is best conceived as a tool or as a science and is something that can be addressed through clarifying the ontologies and epistemologies of GIScience. Second, there is the view that use of a technological tool such as a GISystem can never be inherently neutral and that such technology is often used for ethically questionable purposes, such as surveillance and the gathering of military and industrial intelligence. Web 2.0, discussed below, has begun to address this criticism, since it has gone some way to level the playing field in terms of data access, and enabled participation of a wider cross-section of society in the use of this technology of problem-solving. Moreover, it is difficult to construe the views of the Earth promulgated through services such as Google and Bing as intrinsically privileged, not least if they are open to anyone with access to an Internet browser. Third, there has been a dearth of applications of GISystems in critical research and a preoccupation with the quest for analytical solutions rather than establishing the impacts of human agency and social structures upon unique places. The rise of mixed-method approaches in GISystems (Cope and Elwood 2009) has gone some way toward addressing these concerns. Finally, there is still a view in some quarters that GISystems and GIScience are inextricably bound to the philosophy and assumptions of the approach to science known as logical positivism. This implies that GIScience, in particular, and science, in general, can never be more than a positivist tool and a normative instrument and cannot enrich other more critical perspectives in geography. Although still featured in many introductory courses on social science methodologies, this critique is something of a caricature of the positivist methods that pervade scientific investigation more generally.

3.3

GIScience and Data Science

Data science has become an academic growth industry in recent years, fueled in part by massive increases in the volume and variety of data that are now available via the Internet, in part by growing practical interest in prediction using the techniques of artificial intelligence, and in part by the belief that generic approaches are available to many of the issues of data handling, among them data mining, data search, data modeling, data curation, data sharing, and data description (Kelleher and Tierney

1610

M. F. Goodchild and P. A. Longley

2018). Academic programs have been instituted in response to what is perceived as a rapidly growing market for data skills. Traditionally, information has been regarded as inherently more useful than data. Information can be defined as data that are “fit for purpose” (Longley et al. 2015); information may place data in a specific context and may refer to specific situations, processes, or objects. In that sense a geographic information science is implicitly more sophisticated than a geographic data science. But such semantic quibbles aside, it is clear that data science and geographic information science have much in common and much to learn from each other. A quick review of the syllabi of courses in data science will reveal that few give much attention to geographic information or to techniques of spatial analysis. In part this appears to be a consequence of the belief that there is nothing “special about spatial,” despite the arguments put forth in Sect. 2. There are strong arguments for adopting the approach of data science, and specifically in adopting the mantra “let the data speak for themselves” in addressing problems framed in space and time. Yet it is impossible to measure location perfectly, and many of the attributes commonly processed in GISystems, such as soil class or vegetation cover type, have an inherent degree of subjectivity and are thus non-replicable. In short, uncertainty is present in all geographic information (Zhang and Goodchild 2002), and few if any geographic data sets give the researcher objective knowledge of the differences between the data set and the real world (Janelle and Goodchild 2018). Instead the user must rely on indirect measures such as map scale to understand the limitations of the data. Additional data issues arise from the repurposing of data that were never collected with the interests and concerns of researchers and analysts in mind. This is perhaps most evident in the analysis of consumer data, which arise essentially as a by-product of an interaction between a consumer and a consumer-facing organization in the course of supply of goods or services, such as social media. Such data account for a large and increasing real share of all of the data collected about citizens today, yet the absence of monopoly providers of most goods and services means that issues of self-selection and bias plague reuse of such data for research purposes (Longley et al. 2018). Triangulation of such sources to existing framework data sources such as censuses presents one route beyond this impasse – such sources may be less rich, granular, or frequently collected, but may suggest ways in which bias can be accommodated or research refocused. Prediction has been a major driving force in the growth of data science and the investments being made in this area by major corporations. Tools of machine learning, including artificial neural nets and deep learning, have been shown to be very effective in making successful predictions from highly voluminous and diverse data sets. Yet despite its practical value, prediction has always taken a back seat in science to explanation and understanding. A trained neural network is difficult to interpret within the kinds of hypothesis testing and theory confirmation that have characterized much of science to date. Moreover it is difficult to see how successful these techniques can be in achieving generalizability across study areas, a key requirement of GIScience.

Geographic Information Science

3.4

1611

Neogeography, Wikification, Open Data, and Consumer Data

Recent years have seen the reuse of the term neogeography to describe the developments in Web mapping technology and spatial data infrastructures that have greatly enhanced our abilities to assemble, share, and interact with geographic information online. Allied to this is the increased crowd-sourcing by online communities of volunteered geographic information (VGI: Goodchild 2007) and user-generated content (UGC). As such, neogeography is founded upon the two-way, many-to-many interactions between users and websites that have emerged under Web 2.0, as embodied in projects such as Wikimapia (www.wikimapia.org) and OpenStreetMap (www.openstreetmap.org). Today, Wikimapia contains usergenerated entries for more places than are available in any official list of place names, and the term vernacular region is used to describe regions which emerge from computational analysis of feeds from social networking sites. OpenStreetMap remains well on the way to creating a free-to-use global map database through assimilation of digitized satellite photographs with GPS tracks supplied by volunteers. This has converted many new users to the benefits of creating, sharing, and using geographic information, often through ad hoc collectives and interest groups. Such sites go some way to alleviating concerns about the social implications of GISystems, insofar as participation in the creation and use of GISystem databases is not restricted, and the contested nature of place names and other characteristics can be tagged in publicly editable databases. As such, Web 2.0 simultaneously facilitates crowd-sourcing of VGI while making basic GISystem functions increasingly accessible to an ever-broader community of users. Geographers have long recognized the importance of place in humans’ understanding of geography. The average person is familiar with thousands or even tens of thousands of named places and their associations. Yet GISystems require named places to be represented either as precisely located points, polylines, or polygons. It is a rare individual who knows even roughly the latitude and longitude of his or her home, yet everyone knows their home’s street address and postal code. Recently there has been much interest in a platial approach to geographic knowledge that emphasizes named places as referents, as a more human-centric alternative to the familiar spatial approach with its emphasis on coordinate referents. New data sources, including social media, provide a rich basis for exploring associations of place. Official data are also becoming available through renewed pressures for government accountability, and the broader realization that wide availability of data collected by government and pertaining to citizens can lubricate economic growth. The result has been a plethora of open-data initiatives in many developed countries, leading to Web-based dissemination of data relating to many areas of public concern, such as personal health, transport, property prices, and even the weather. The previous section described the increasing reuse of consumer data for research purposes, arising out of initiatives that develop common ground in spatial analysis from shared problems of commerce, government, and academia.

1612

M. F. Goodchild and P. A. Longley

Conventional official sources such as censuses of population today account for a very much smaller proportion of the data that are collected about citizens, and there is a sense in which open and consumer data initiatives are playing catch-up – providing researchers and analysts with some facility with which to understand the increasingly diverse and complex social, economic, and demographic milieu that characterizes advanced societies. Despite the hubris that has been generated around open and consumer data initiatives, however, most of the data sources that have been released present extremely partial and disconnected representations of the world. For reasons set out in the discussion of modifiable areal unit effects above, the much more holistic concerns with issues of choice and service delivery, or the localism agenda in general, require linked characteristics at the level of the individual citizen or at the very least small neighborhood units. These initiatives bring new analytical focus to anonymization and data disclosure prevention procedures, which in principle may lead to release of data pertaining to individuals rather than geographic aggregations. All of this requires clear thinking of issues of spatial resolution (level of detail) and disclosure control that are central to the wider spatial literacy agenda (Janelle and Goodchild 2011). It may be impossible to guarantee privacy in GISystems, a consideration that is likely to reignite aspects of social critique. Open and consumer data initiatives are creating the need for a broader policy framework for data that responds to concerns of citizen privacy and confidentiality while remaining cognizant of the benefits that can accrue though opening up, integrating, and using the contents of government data silos. What level of data degradation is an informed public likely to be happy with, if it can be shown to bring benefits in terms of efficient and effective provision of public and private goods? A related issue is that empowerment of the many to perform basic (and even advanced) GISystem operations brings new challenges to ensure that tools are used efficiently, effectively, and safely. Whether using official statistics or VGI, Web 2.0 can never be more than a partial and technological substitute for understanding of the core organizing principles and concepts of GIScience. These highlight the need to know and specify the basis of inference from the partial representations that are used in GISystems to the world at large; yet such information is conspicuous by its absence from many VGI sources.

4

Conclusion

In undertaking a wide-ranging review of the achievements of GIScience, this chapter has also set out the principal issues and challenges that face these fields today. Improved computation and the facility to create, concatenate, and conflate large datasets will undoubtedly guide the future trajectories of the fields in the short to medium term. Ultimately, though, our focus in this chapter has been upon changes in scientific practice that may appear mundane but are nonetheless profound and far-reaching. Good science is relative to what we have now, and improved understanding of data and their provenance is a necessary precursor to better analysis of spatial distributions in today’s data- and computation-rich world.

Geographic Information Science

1613

Ultimately, GIScience is an applied science of the real world and in large part will be judged upon the success of its applications. Improved methods and techniques can certainly help, as can ever-greater processing power. Yet the experience of the last 20 years suggests that there are rather few purely technical solutions to substantial real-world problems. The broader challenge is to address the ontologies that govern our conception of real-world phenomena and to undertake robust appraisal of the provenance of data that are used to represent the world using GISystems. This argues that the practice of GIScience poses fundamental empirical questions that require place or context to be understood as much more than location. Scientific approaches to representing places will undoubtedly benefit from the availability of new data sources and novel applications of existing ones, as well as citizen participation in their creation and maintenance. Yet a further quest for GIScience is to develop explicitly geographical representations of the accumulated effects of historical and cultural processes upon unique places.

References Adnan M, Longley PA, Singleton AD (2014) Parallel processing architectures of GPU: applications in geocomputational geodemographics. In: Abrahart R, See L (eds) GeoComputation, 2nd edn. Taylor and Francis, London Anselin L (1989) What is special about spatial data? Alternative perspectives on spatial data analysis. Technical paper 89-4. National Center for Geographic Information and Analysis, Santa Barbara Anselin L (1995) Local indicators of spatial association – LISA. Geogr Anal 27(2):93–115 Batty MJ, Longley PA (1994) Fractal cities: a geometry of form and function. Academic, San Diego Clarke KC, Gaydos L (1998) Loose coupling a cellular automaton model and GIS: long-term growth prediction for San Francisco and Washington/Baltimore. Int J Geogr Inf Sci 12(7):699–714 Cope M, Elwood S (2009) Qualitative GIS: a mixed methods approach. Sage, Thousand Oaks Duckham M, Goodchild MF, Worboys MF (2003) Foundations of geographic information science. Taylor and Francis, New York Egenhofer MJ, Franzosa RD (1991) Point-set topological spatial relations. Int J Geogr Inf Syst 5(2):161–174 Evans TP, Kelley H (2004) Multi-scale analysis of a household level agent-based model of land cover change. J Environ Manag 72(1–2):57–72 Fotheringham AS, Brunsdon C, Charlton M (2002) Geographically weighted regression: the analysis of spatially varying relationships. Wiley, Hoboken Goodchild MF (1992) Geographical information science. Int J Geogr Inf Syst 6(1):31–45 Goodchild MF (2007) Citizens as sensors: the world of volunteered geography. GeoJournal 69(4):211–221 Hey AJG, Tansley S, Tolle KM (2009) The fourth paradigm: data-intensive scientific discovery. Microsoft Research, Redmond Janelle DG, Goodchild MF (2011) Concepts, principles, tools, and challenges in spatially integrated social science. In: Nyerges TL, McMaster R, Couclelis H (eds) The SAGE handbook of GIS and society. Sage, Thousand Oaks, pp 27–45 Janelle DG, Goodchild MF (2018) Territory, geographic information, and the map. In: Wuppuluri S, Doria FA (eds) The map and the territory: exploring the foundations of science, thought and reality. Springer, Dordrecht, pp 609–628 Kelleher JD, Tierney B (2018) Data science. MIT Press, Cambridge, MA

1614

M. F. Goodchild and P. A. Longley

LeSage J, Pace RK (2009) Introduction to spatial econometrics. CRC Press, Boca Raton/London/New York Longley PA, Goodchild MF, Maguire DJ, Rhind DW (2015) Geographic information science and systems, 4th edn. Wiley, Hoboken Longley PA, Cheshire JA, Singleton AD (2018) Consumer data research. UCL Press, London Maguire DJ, Longley PA (2005) The emergence of geoportals and their role in spatial data infrastructures. Comput Environ Urban Syst 29(1):3–14 National Science Foundation (2003) Report of the blue-ribbon advisory panel on cyberinfrastructure. National Science Foundation, Washington, DC Openshaw S (1983) The modifiable areal unit problem. GeoBooks, Norwich Openshaw S, Abrahart RJ (1996) Geocomputation. In: Abrahart RJ (ed) Proceedings, first international conference on GeoComputation, University of Leeds, pp 665–666 Ord JK, Getis A (1995) Local spatial autocorrelation statistics: distributional issues and applications. Geogr Anal 27(4):286–306 Parker DC, Manson SM, Janssen MA, Hoffmann MJ, Deadman P (2003) Multi-agent systems for the simulation of land-use and land-cover change: a review. Ann Assoc Am Geogr 93(2):314–337 Pickles J (ed) (1993) Ground truth: the social implications of geographic information systems. Guilford Press, New York Shreve RL (1966) Statistical law of stream numbers. J Geol 74:17–37 Tobler WR (1970) A computer movie simulating urban growth in the Detroit region. Econ Geogr 46(2):234–240 Torrens PM, Li X, Griffin WA (2011) Building agent-based walking models by machine learning on diverse databases of space-time trajectory samples. Trans Geogr Inf Sci 15(s1):67–94 Wang S (2016) CyberGIS and spatial data science. GeoJournal 81(6):965–968 Wilson AG (1970) Entropy in Urban and Regional Modelling. Pion, London Wright DJ, Goodchild MF, Proctor JD (1997) Demystifying the persistent ambiguity of GIS as ‘tool’ versus ‘science’. Ann Assoc Am Geogr 87(2):346–362 Zhang JX, Goodchild MF (2002) Uncertainty in geographical information. Taylor and Francis, New York

Geospatial Analysis and Geocomputation: Concepts and Modeling Tools Michael de Smith

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Geocomputation and Spatial Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Geocomputational Models Inspired by Biological Analogies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Networks, Tracks, and Distance Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Computational Spatial Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1616 1617 1619 1623 1624 1626 1627

Abstract

This chapter provides an introduction to geocomputation and geocomputational methods. As such it considers the scope of the term geocomputation, the principal techniques that are applied, and some of the key underlying principles and issues. Chapters elsewhere in this major reference work examine many of these ideas and methods in greater detail. In this connection it is reasonable to ask whether all of modern spatial analysis is inherently geocomputational; the answer is without doubt “no,” but its growing importance in the development of new forms of spatial analysis, in exploration of the behavior and dynamics of complex systems, in the analysis of large datasets, in optimization problems, and in model validation remains indisputable. Keywords

Geographical information system · Cellular automaton · Travel salesman problem · Geographically weight regression · Dynamic optimization problem

M. de Smith (*) Department of Geography, University College London, London, UK e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_62

1615

1616

1

M. de Smith

Introduction

For many researchers the term geocomputation refers to “the art and science of solving complex spatial problems with computers.” This definition, which is the tag line of the academic conference series run under the banner “geocomputation.org,” captures the essence of the term geocomputation. As such it embraces all manner of concepts, tools, and techniques that form part of mainstream geographical information systems (GIS) and science (see chapter ▶ “Geographic Information Science” by Goodchild and Longley, this volume) and the methods employed in spatial analysis. For example, the 2011 Geocomputation conference included, among other topics, sessions entitled Geodemographics, Genetic Algorithms and Cellular Automata Modeling, Agent-Based Modeling (ABM), Geostatistics, Space-Time Modeling and Analysis, Network Complexity, Machine Learning, GeoVisual and Terrain Analysis, and Geographically Weighted Regression. Thus, parts of the academic world currently use the term geocomputation to apply to a very wide range of spatial analysis and modeling procedures, particularly those for which computational resources are central to the techniques employed. Readers will have noticed that each of the above topics, with the exception of machine learning, is represented by one or more chapters in this major reference work. As Fischer and Leung (2001), Prologue have noted, geocomputation may also be viewed as a research paradigm that has changed the view of research practice in geospatial science over the past two decades. The driving forces behind the paradigm are fourfold: first, the increasing complexity of our spatiotemporal systems (nonlinearity, uncertainty, discontinuity, self-organization and continual adaptation); second, the need to develop new ways of utilizing and handling the increasingly large amounts of spatial information from the GIS and remote-sensing revolutions; third, the availability of attractive computational (intelligence) technologies which provide the modeling tools; and finally, the advent of high-performance computers. As individual analytical methods that are considered geocomputational become accepted as providing effective solutions to specific spatial analysis problems, so they start to appear in more generic software with widespread usage. This is particularly apparent in some areas of remote-sensing data analysis, visualization tools, and agentbased modeling and in association with a number of statistical and spatial optimization problems. Thus, we are led to the conclusion that geocomputation is often used as an umbrella term for approaches to the analysis of problems that have a specifically geographical or environmental data focus, use modern computational techniques, and leverage high-performance computing hardware. This remains consistent with Openshaw’s original vision for the discipline (Openshaw and Abrahart 2000, p. x). As such, therefore, geocomputation is not a fixed set of techniques and models but an evolving, shifting collection of ideas, computational tools, and techniques with a particular emphasis upon the spatial domain, at scales from the architectural to the global. Modern geocomputational research remains true to the objectives Openshaw laid out more than a decade ago as being “. . .all about the use of relatively massive computation to tackle grand challenge (viz. almost impossible to solve) [geo]problems of immense complexity” (Openshaw and Abrahart 2000, p. 9).

Geospatial Analysis and Geocomputation: Concepts and Modeling Tools

1617

In recent years many fields of scientific research have become dependent upon computational rather than so-called analytical methods. This is particularly apparent in mathematics, in areas such as combinatorial analysis, network analysis, and much of modern statistics. Likewise, developments in GIS software and remote sensing have relied very heavily on advances in computer memory, disks, processor speed, processing architectures, and visual display technology in order to provide the tools that are regarded as essential for digital mapping and related tasks. The challenge for geocomputational techniques is to provide real added value by providing practical tools that help solve complex problems, extract essential information from large datasets, and enhance the understanding of spatial processes and outcomes. Such techniques and modeling exercises may not be simple or parsimonious, but frequently they have the merit of providing more meaningful outcomes that match realworld experiences more closely than traditional analytical models. The real world (physical and social) is intrinsically complex, dynamic, and often unpredictable, and thus, computational spatial modeling tools that embrace this complexity can make a powerful contribution to our understanding of how the world works.

2

Geocomputation and Spatial Analysis

In our introductory section, we noted that geocomputation is a term that is applied to computationally demanding methods of spatial analysis. By definition problems that are addressed by geocomputational methods push the boundaries of computing technology. Since this is a rapidly changing field, with continuous improvements in processing power, memory availability and speed, networking, and storage capacity, yesterday’s geocomputational problem may well become today’s standard procedure. Problems that had previously to be run on limited size datasets, or with severe limitations to the spatial and temporal resolutions employed, can in due course be run with much finer resolution datasets, with more realistic assumptions, and with many repetitions if so required. Specialized computing architectures can be leveraged very effectively for many spatial problems – for an excellent recent discussion of this field, see Huang et al. (2011). For example, it is becoming straightforward to use multiple processors on single computers or across multiple computers (e.g., using grid networks or cloud computing, Adnan et al., this volume) to run many simulations in parallel that are identical other than with respect to their initial and boundary conditions. Likewise, classification techniques, such as k-means clustering, can be readily implemented using parallel systems processing since the procedure involves a series of similar runs for k = 2, . . . , 20 say, iterated for a set of (randomly) chosen initial seed locations. In other instances the software application itself requires substantial redesign in order to operate effectively across multiple processors and memory regions. This is particularly true for data-intensive applications, where the dimensionality of the problem may be limited but the volumes extremely large (e.g., very high-resolution satellite imagery and LiDAR (Light Detection And Ranging) point clouds, determination of optimal parameters for fine-resolution spatial interaction

1618

M. de Smith

models). In such cases splitting the problem into smaller units (e.g., decomposition of the data into tiles or cells), processing the separate units in parallel, and then merging the results may be the only practical approach that can be adopted. With the rise of increasingly large spatial datasets, almost all aspects of spatial data processing become a challenge, but with geocomputation the focus is upon problems that resist classical forms of analysis. One of the clearest examples of this arises in the field of spatial simulation (often now referred to as geosimulation – see the work of Paul Torrens at http://geosimulation.org/). Although there are a variety of approaches to such simulation – for example, cellular automata, agent-based modeling, and randomized network-based modeling – the results obtained at the macroscale from microscale (or bottom-up) modeling are often unexpected and unpredictable. Macroscale structures and behaviors “emerge” from simple microscale processes, in a similar manner to the emergence of macroscale features in biological systems – such as the appearance of segmented body structures and articulated limbs in a wide range of insects and higher life forms and the appearance of “swarms” or flocking behavior among birds, fish, and many other animals. In geospatial analysis, examples include modeling the behavior of pedestrians during evacuation emergencies, examining the way in which disease spreads among communities, and modeling and predicting urban land use and transport changes over time. Microscale or “bottom-up” simulation is only one of a number of major application areas for geocomputational methods. Other examples include computational spatial statistics – procedures that seek to provide statistically valid insights into complex spatial datasets; optimization problems, ranging from optimal location and routing problems to the determination of optimal model parameters for highly parameterized models; procedures that seek to augment data at a given scale and point in time with related datasets at different levels of aggregation and/or extending over a number of time periods; and a wide range of advanced visualization techniques. In many instances these issues are not separate, independent concerns but apply simultaneously to the iterative process of model building, thereby demanding considerable skill in model construction, software engineering, data management, and validation. The results of such work are often surprising and impressive but may also be difficult to follow in detail and hard to justify. Is understanding always advanced when such methods are employed or do such models match the past and present so well because they are highly parameterized and intensively fitted? In the sections that follow, we commence in sect. 3 by examining a number of geocomputational methods that have been inspired by analogy with biological processes. These include cellular automata and agent-based models, computational neural networks, and evolutionary algorithms. In sect. 4 we then discuss a number of geocomputational techniques that have been applied to network-related problems rather than point- or areal-based spatial problems. In sect. 5 we look at the rising importance of computational methods in statistical science and the impact of this development to the field of spatial statistics. We conclude in sect. “6” by commenting upon issues of spatial and temporal resolutions and questions of model complexity.

Geospatial Analysis and Geocomputation: Concepts and Modeling Tools

3

1619

Geocomputational Models Inspired by Biological Analogies

The analogy with biological processes runs far deeper in the field of geocomputation than might be expected. Three major classes of methods commonly applied in geocomputation have been inspired by biological analogies (see, further, de Smith et al. 2009, Chap. 8). The first of these embraces a number of simulation techniques, many of which owe their origins to a simple cellular automaton (CA) simulation known as “the Game of Life.” This extremely simple model, which operates within the framework of a 2D matrix of square cells, was introduced by the mathematician John Conway in the 1970s and has a surprisingly extensive and complex range of outcomes. The rules of the Game can be summarized as follows: The Game of Life has only two states for each cell (alive or dead, or 1 or 0) and three simple state transition rules: (i) survival: if a cell is ‘alive’ (i.e., its state is ‘true’ or ‘1’, for instance) and it has two or three alive neighbors it remains alive; (ii) reproduction: if a cell is ‘dead’, but has three alive cells within its neighborhood its state becomes alive, and (iii) loneliness (less than two neighbors) or overcrowding (more than three neighbors): the cell dies. Despite these simple rules, the Game of Life can produce a range of complex behaviors from different initial conditions. This archetypal spatial model demonstrates an important characteristic of many geocomputational methods – the emergence of broad classes of pattern that could not have been predicted prior to simulation. It also highlights a number of other common features of such procedures: the outcomes of geosimulation exercises have an element of framework dependency (structures and boundaries) and, in some instances, path dependency (i.e., the future path is heavily determined by past steps and initial conditions and may lead to stable or unstable/widely divergent results). Hence, key structural attributes of each type of simulation model have a direct bearing on the kinds of outcome seen. These attributes (for problems of the cellular automata type) include state variables (the possible states the system supports and the set of initial states that are examined); the spatial framework used for modeling (e.g., use of a 2D rectangular grid, a 3D lattice, a toroidal space, etc.); the form of neighborhood effects permitted (e.g., single or multiple steps, distance bands, different forms of adjacency); the state transition rules applied (i.e., how the states may change over time); and finally, how the time dimension is treated – discrete or continuous, serial or parallel. Wolfram (1983) studied the Game of Life in detail and demonstrated that the outcomes for all simulations of this general type fall into four classes. For the Game of Life, these are (i) static patterns, (ii) oscillators (or periodic patterns), (iii) spaceships (patterns that repeat themselves but translated in space), and (iv) patterns that increase in population size (a range of different patterns and behaviors). A useful summary with examples and references is available on the Wikipedia page “Conway’s Game of Life.” The unexpected complexity of the resulting behavior from the rules Wolfram identified led him to believe that complexity in nature may be due to similar behavior.

1620

M. de Smith

Batty (2000) provides an excellent review of CA models and their application to urban systems modeling, including commentary on their strengths and weaknesses, and identifies areas for future research in this field. Central among these are questions of improved representation of the real world, moving from the fixed framework of square cells to incorporate irregular zones, streets, and other linear forms, and new definitions of how such elements interact. To an extent, these issues have been tackled in recent years using an alternative geosimulation framework, known as agent-based models (ABMs). Commencing with this simple cellular-based simulation framework, computer scientists have developed a large and expanding family of techniques and tools based on microscale or bottom-up simulation. Among these are a range of approaches that liberate the simulation from the spatial constraints of a lattice-based framework and permit-free movement in 2D or 3D space (agent-based models). One example is the so-called ant colony optimization (or ACO) approach, in which the space of interest is explored by synthetic ants (the agents in this case) – those that reach a desirable objective (e.g., a target location/food source) by any route then return to the ant colony by a relatively direct route laying a synthetic pheromone trail. Subsequent ants are attracted to the pheromone trail re-enforcing its usage. But because the pheromone evaporates over time, only the route or routes that are most used tend to be retained, and since shorter routes lead to and from the food source more quickly, these tend to be re-enforced at the expense of longer (i.e., less desirable) routes. ACOs are a form of agent-based model (where the ants are the agents), and because of the way such systems are implemented, with large populations of independent agents acting in a collective manner, ACOs and similar procedures fit within the broad area known as “swarm intelligence.” Other microsimulation procedures within this broader paradigm have been applied with considerable success to crowd behavior modeling, for example, helping to manage large-scale street events such as carnivals and protest marches, football stadiums, and emergency evacuation of complex buildings. Note that in these more advanced geocomputational models, a clear distinction exists between the population (agents in this case) and the environment. This is not the case with cellular automata. There is also the inherent assumption that while the population responds to the environment, the reverse is generally not the case – for some applications (e.g., those involving anthropogenic change), such assumptions are not realistic. Recently a number of urban geosimulation models have sought to reflect realworld spatial structures and their attributes within the model framework. In these models the spatial framework for the model utilizes fine-scale digital maps of the study area combined with synthetically generated population data. A synthetic population is a computer-generated randomized set of individuals, households, or other entities of interest that match predefined aggregate attributes (or classifications) for zones within a study region. Such populations are often used as the building blocks for microsimulation projects of the kind described above, where the necessary individual-level data does not exist and a match to “ground truth” information is required (e.g., in land use planning, a traffic simulation model, a medical study, or even a model of burglary events that incorporates the potential victims of crime).

Geospatial Analysis and Geocomputation: Concepts and Modeling Tools

1621

Typically the finest level of available zonal data (such as small area census zones) is used to specify the attributes of the synthetic population. The aggregated characteristics of this computer-generated population are chosen to ensure that they closely approximate the zonal figures. A recent comparison of procedures for achieving such approximations is provided by Heppenstall et al. (2011), whose tests identify the use of simulated annealing (SA) as the most effective (if relatively slow) procedure for minimizing classification error. The synthetic individuals are then allocated to particular locations within the study area zones. This might be a process of random allocation to known buildings or by reference to land use maps. In many instances the most practical option is to rasterize the zones and to allocate the synthetic population to cells based on a probabilistic assignment derived from land use. The result is a representative population of individuals, allocated to meaningful point or cell locations (thereby minimizing spatial aggregation bias) that may be used within a microsimulation model. This lends such models credibility and a level of realism derived from the knowledge that their aggregate characteristics match those that are known, even though they do not strictly represent the actual set of individuals within the study area. Thus, a form of “study error” remains, but the approach goes some way to resolving the ever present issues of statistical and spatial aggregation associated with many datasets. The second main area of biologically derived geocomputational methods is the field known as computational neural networks (CNNs). These methods take their inspiration from highly simplified models of neurological processes, originally drawing on models of how neurons in the retina operate. In practice the analogy has proved useful in guiding the general structure of computational models rather than any direct appeal to biological process similarity. CNN methods have been successfully applied to a limited number of geospatial problems – most notably the modeling of trip distribution data (see, e.g., Fischer 2006) – and for multispectral image analysis. Typically three layer models have been applied, with an input layer, single middle layer, and an output layer. CNNs are models whose behavior is determined in part by the model structure itself and in part by the data used to “train” the model parameters. Once the structure has been specified by the research scientist (which may be an iterative process in itself) and training data selected, applied, and evaluated, the result is a modeling system that can be used on more general “unseen” datasets. As with the previous discussion of systems inspired by artificial life, the outcomes achieved are strongly dependent on a number of welldefined attributes of the model: the number of layers used in the neural model, the number of nodes used in each layer, the form of the forward and backward propagation algorithms, and the connectivity of the network. This dependency on the structure of the model is then accentuated by an element of data dependency since the selection of the training dataset and control (evaluation) dataset defines the parameters that are then used on unseen data. CNNs have found their most widespread geospatial application in the field of remote sensing, where they have been found to be a very effective tool for inferring land use and land use change. However, the nature of such models suggest that they operate primarily as a form of pattern recognition engine, a relatively highly parameterized black box, that like higher

1622

M. de Smith

animals can be very good at recognizing familiar shapes and textures but by itself contributes little to understanding and thus can be quite specific to the datasets any specific model can be applied to. In the longer term a combination of high speed automated CNNs with “intelligent systems” may produce much more generic and flexible tools that offer a real step forward in efficient processing of large spatial datasets. The third biological analogy we discuss here is “survival of the fittest.” Here the objective is to solve a range of difficult combinatorial optimization problems using analogies from genetics. There is a wide range of important optimization problems for which it is believed no solution algorithm exists that can be run in polynomial time – that is, such problems are intrinsically nonpolynomial (or NP). What this means in practice is that for problems that involve any realistic quantities of data (i.e., n objects, where n is not small), it is impossible to find a provably globally optimal solution in a finite amount of time. As n increases so the problem becomes increasingly intractable. The archetypal example of such a problem is the simple traveling salesman problem (TSP). The TSP involves finding the optimal route through a set of points (e.g., towns) that ensures each is visited just once and the total length of the tour is minimized. The best algorithms for solving the simple TSP can now provide provably optimal solutions for n in the 1,000 s, which is very impressive. Genetic algorithms (GAs) can be used to “solve” the TSP but are far less efficient than many other approaches – that is, GAs typically produce poorer results (suboptimal solutions) and take longer to achieve these (many 1,000 s of iterations or “generations”). One reason for this is that GAs are a generic rather than specific approach to problem solving – they can often provide a reasonably good answer rather than the best answer. And this raises another interesting question – is the optimal solution the best solution? Initially we would immediately respond “of course,” but in many instances this perspective must be qualified by recognizing that it applies to the problem as specified, which is typically very tightly specified and generally static. If the kinds of problem to be tackled are less well specified and dynamic or involve more complex choice models, it is possible that apparently suboptimal computational methods might well fare far better than fixed algorithms. GAs are a particular subset of a broader class of algorithms known as evolutionary algorithms (EAs). EAs have been applied to a variety of dynamic optimization problems with some success. This raises the question of what is meant by “dynamic.” Clearly all problems exist in a temporal context but are only considered dynamic if some aspect of the problem changes over time. Key issues are therefore the frequency and scale of any such changes, their behavior (e.g., predictable or random, trending or cyclical, or some combination of these), and the nature of the changes taking place – are the relevant dynamic elements changes in the environment the objective function or perhaps constraints that affect the system behavior? In a static optimization problem, a traditional genetic algorithm attempts to converge toward a global optimum by a process of genetic selection – retaining genetic strings that are fitter (provide solutions closer to the optimum, as measured by some fitness function) and modifying genes by various forms of mutation and crossover. Since these strategies are designed to migrate the solution toward a static optimum, they

Geospatial Analysis and Geocomputation: Concepts and Modeling Tools

1623

are too restrictive for problems that include dynamics. Dynamic optimization problems require modification of such survival strategies by taking into account the nature of the changes that may occur. In some instances such changes are so dramatic and unpredictable that attempts at developing solution procedures will never succeed, but in many instances changes are smoother or have a more predictable effect (e.g., reducing the severity of some constraints), and in these cases dynamic EA procedures can be used. Example approaches that have been found to be effective include the concept of introducing random immigrants into the population, helping to maintain genetic diversity, and so-called forking of populations, in which an entire subset (e.g., a selection of children) are separated off and explore their own evolutionary path transposed from the parent population (a form of emigration).

4

Networks, Tracks, and Distance Computation

Many of the techniques described above apply to point and areal datasets (zonal or lattice-based), although some relate to networks. Until recently attention on linear forms has tended to concentrate on a range of network optimization problems, such as determining shortest paths, least cost routes, multivehicle (capacity constrained) route allocation, optimal on-network facility location, and optimal arc routing (e.g., street cleaning) – see de Smith et al. (2009), Chap. 7 for a detailed review of this area. In many cases very fast algorithms have been obtained to solve such problems, and in some cases these solutions are known to be optimal. In other cases achieving a provably optimal result may be impossible and/or require unlimited computed resources for large problems (i.e., problems with many links and nodes). As a result a wide range of suboptimal but effective procedures have been developed, which provide very acceptable if not optimal solutions. Among these are highly problemspecific algorithms (e.g., the A* algorithm for determination of the shortest path in a network), various forms of linear programming and dynamic programming, and much more generic procedures, such as genetic algorithms (GAs), cellular neural networks (CNNs), simulated annealing (SAs) algorithms, ant colony optimization (ACO) methods and many more. In most cases these models and methods seek optimal or near optimal solutions to complex routing problems in static network environments. However, it is increasingly apparent that solution procedures need to reflect the dynamics of real-world networks, where road traffic or telecommunication traffic intensity and connectivity may vary rapidly in time and space (see chapter ▶ “Spatio-temporal Data Mining” by Cheng, this volume). Thus, geocomputational systems need to be able to respond in near real time to events as they occur, particularly when applied in command and control situations. The emergence of new forms of data flows, most notably from individuals (people, animals) or vehicles on the move and from satellite tracking and sensing, has led to a substantial new body of spatial data that demands our attention. In some instances these data need to be fed directly into the processing systems in order to identify problems and direct changes to the way these systems manage and forecast near-term events (e.g., traffic routing, crowd control, evacuation management),

1624

M. de Smith

while in other cases the data can be seen as a complex series of tracks, marking out the routes chosen by numerous individuals or other entities over time. This tracking information, often generated by communications devices that incorporate Global Positioning System (GPS) or similar technologies, provides an entirely new body of complex spatial data that demands new forms of analysis to interpret and leverage results. Research into the use of such data is at an early stage, with initial applications in fields such as attempting to predict animal behavior (e.g., migration routes, preferred habitat selection), modeling the behavior of would-be burglars, and analyzing the tracks of hurricanes. Accurate computation of distance is a fundamental requirement of almost every form of geocomputation, but in many instances approximations are used that may result in systematic errors. The main issues are the use of Euclidean measure when network-based measures are more appropriate and the use of local Euclidean distances when undertaking operations across raster files (see further below). These issues become more complex when constraints are added, for example, when restricted areas are included, barriers and turn restrictions are accounted for, and where the underlying space is no longer treated as being homogeneous. In the first case, Euclidean measure may substantially underestimate interpoint distances. Computation of network distances across dense street networks can be time consuming, with the result that some GIS operations such as calculating the extent of drive-time polygons, or computing optimal facility locations, may be very slow. Where distance computations are not based on networks and are computed using Euclidean, spherical, or geodesic measure, they can be calculated extremely quickly. In some instances such measures may be used as a surrogate for network distance or as a means of generating good “candidate lists” of solutions which can then be used as initializations for network-based optimization. In the second case, distance computations on raster files, many applications use operations on the immediate eight neighbors of each cell in the raster (i.e., work on a 3  3 array of cells). Where these involve distance calculations, for example, when the lengths of paths are computed across a cellular model or hydrological flows analyzed in a digital terrain model, the distance computations are often incorrect. Local Euclidean distances (1 for N, S, E, W movements and 1.414. . . for diagonal compass movements) result in errors of almost 8% in overall distances and result in errors in optimal path selection. The solution in such cases is to use optimal 3  3 or 5  5 values (real or integer approximations) or to use exact Euclidean distance transforms (DTs) that correct the propagated errors (see, further, de Smith et al. 2009, Sect. 4.4.2.2). Furthermore, DTs may be used as a geocomputational procedure that can solve a variety of spatial optimization problems, notably optimal route finding in nonuniform free space (nonnetwork) environments where gradient, curvature, and other constraints apply.

5

Computational Spatial Statistics

Our earlier section on biologically inspired models embraces a large part of the subject we have been referring to as geocomputation. To some extent in parallel, a separate and rather different area of geocomputation has been developed. This

Geospatial Analysis and Geocomputation: Concepts and Modeling Tools

1625

involves the application of computationally intensive methods to a range of problems arising in spatial statistics. Openshaw’s (1987) groundbreaking work in this field involved attempting to apply raw computing power to identify potentially significant clusters of point-referenced data such as the incidence of rare diseases (e.g., cases of childhood leukemia). More recently geocomputational methods have been developed that provide similar functionality within a more statistically robust framework. Principal among those now widely used are spatial scan statistics, originally developed by Kulldorf (1997) at the National Cancer Institute in the USA. Kulldorf’s scan procedures have been developed and extended over the years and many of these developments have been implemented in the software package SaTScan (for a recent example application, see Greene et al. 2010). As the SaTScan authors’ state, the software is designed to: (i) Perform geographical surveillance of disease, to detect spatial or space-time disease clusters, and to see if they are statistically significant. (ii) Test whether a disease is randomly distributed over space, over time, or over space and time. (iii) Evaluate the statistical significance of disease cluster alarms. (iv) Perform repeated time-periodic disease surveillance for early detection of disease outbreaks. This description of how SaTScan operates, as a computationally intensive approach to spatial statistical analysis, is typical of many forms of modern spatial analysis. It utilizes computational power to search for and examine patterns within large volumes of data. In this case the procedures applied involve a scanning process, similar in concept to scanning algorithms applied in remote sensing and image processing (e.g., as used in the computation of distance transforms, noted above) – it is fast, simple, and exhaustive. In many other instances the search space is more complex, often multidimensional, and the search procedures cannot be exhaustive but must rely on heuristics, including those biologically inspired procedures described earlier. Another feature common to geocomputational statistics is the production of pseudoprobability distributions by large-scale simulations or random permutations. These procedures recognize the limitations of traditional analytical methods and seek to use observations and/or aspects of observed spatial structure to generate a large set of possible outcomes under conditions of randomization. Using these simulations one or more statistics of interest are computed and the observed value in a given dataset is then regarded as a particular realization within the simulated population. If the observed statistic appears exceptional (e.g., the mean distance to nearest neighbor in a bounded point set is measured and found to be very small or very large when compared to a large number of random simulations using the same number of points within the same spatial extent), then it can be regarded as “significant.” Care must be taken when carrying out such procedures to avoid generating pseudoprobability distributions that involve resampling the same data (or regions) multiple times and to take into consideration questions of the independence of samples.

1626

M. de Smith

Likewise, for data relating to planar lattices (e.g., all census tracts within a state), statistics may be computed and compared with a large number of permutations of the observed data values across the lattice. This procedure, which is widely used to generate pseudoprobability distributions and evaluate the “significance” of observed patterns, is less satisfactory from a statistical perspective. It assumes that the data values for the M zones are effectively fixed and could have been allocated at random to any of the M zones in the study area. In practice this is not the case – it is more reasonable to assume that the total count (e.g., of cases of a specific illness) could have been randomly divided into M partitions (based on a uniform or observed frequency distribution) and each then assigned at random to one of the zones. The assigned values may then be used to calculate the statistics of interest, with repetition of this process yielding a pseudoprobability distribution that can then be used for comparative purposes. However, this procedure also has limitations since the partitioning and allocation among zones is purely random and may not reflect important variations between these zones, for example, in terms of the size of the population at risk. Furthermore, establishing an appropriate null hypothesis in such cases may be difficult or impossible since zonal boundaries are often arbitrary and some level of spatial autocorrelation is always present. Procedures such as geographically weighted regression and spatial regression models (see the chapters ▶ “Geographically Weighted Regression” and ▶ “CrossSection Spatial Regression Models” in this handbook), spatial analysis on networks (SANET), and geostatistical modeling can also be considered as geocomputational methods. They rely on computational power to produce insights into the statistical significance of patterns and to facilitate model building where the parameter space is large and the observed datasets increasingly large and complex. Other computationally intensive statistical procedures, such as Markov Chain Monte Carlo (MCMC) techniques widely used in Bayesian model building (e.g., the GeoBUGS project), bootstrapping, and cross validation, all may be considered as having close links with the field of geocomputation.

6

Conclusions

A feature of many spatial datasets is their increasing complexity and size – many datasets are now available at a variety of levels of aggregation, not always relating to the same data. Techniques for maximizing the use of such data – for example, combining point and areal measures – are becoming more common. Likewise there is a growing availability of spatial datasets for different time periods, varying from relatively long period time slices (e.g., annual), to monthly, daily, or even shorter-time windows. This has led to researchers seeking to revise some traditional models, such as those relating to spatial interaction and trip distribution, to incorporate diurnal variations in population (e.g., home-based and work-based shopping trips, evaluating the optimal location of ambulances or police patrols). In such cases the objective is to improve the quality of models in order to provide improved understanding of the processes at work and the outcomes that can reasonably be predicted. However, in a wide ranging discussion of computational model building, Batty (2011), p. 28 argues that “as the level of detail in terms of sectors, spatial-locational

Geospatial Analysis and Geocomputation: Concepts and Modeling Tools

1627

resolution, and temporal resolution increases, data demands generally increase and models become increasingly difficult to validate in terms of being able to match all the model hypotheses . . . to observed data. As temporal processes are added, this can become exceptionally difficult. . .and we face a severe problem of validation. . .. This tends to force modeling back to the traditional canons of scientific inquiry where parsimonious and simple models are the main goal of scientific explanation.” Thus, there is much to be gained from advances in model construction using computationally intensive procedures, but the results can be very difficult to validate, comprehend in detail, or justify to stakeholders, leading to considerable tension among the research community. Advances in visualization, ranging from improved 2D and 3D graphics to new metaphors for data exploration (e.g., Google Earth, dynamic flythroughs, immersive systems) and video display of the progress of geosimulations, all help to overcome some of the issues raised by Batty. And despite the difficulties, as the program for the 2011 Geocomputation conference identifies, this field is one attracting intense interest, is of great practical significance, and is set to become one of the defining scientific paradigms of the twenty-first century.

References Batty M (2000) Geocomputation using cellular automata. Ch 5. In: Openshaw S, Abrahart RJ (eds) Geocomputation. Taylor and Francis, London, pp 95–126 Batty M (2011) A generic framework for computational spatial modeling. Working paper 166. Centre for Advanced Spatial Analysis (CASA), UCL, London. Available from http://www.casa. ucl.ac.uk/publications/workingPaperDetail.asp?ID=166. Accessed 4 Oct 2011 de Smith MJ, Goodchild MF, Longley PA (2009) Geospatial analysis: a comprehensive guide to principles, techniques and software tools, 3rd edn. Troubador Publishing, Leicester. Also available online at http://www.spatialanalysisonline.com Fischer MM (2006) Spatial analysis and geocomputation: selected essays, vol 1. Springer, Heidelberg Fischer MM, Leung Y (eds) (2001) GeoComputational modelling. Techniques and applications. Springer, Berlin/Heidelberg/New York Greene SK, Schmidt MA, Slobierski MG, Wilson ML (2010) Spatio-temporal patterns of viral meningitis in Michigan, 1993–2001. In: Fischer MM, Getis A (eds) Handbook of applied spatial analysis. Software tools, methods and applications. Springer, Berlin/Heidelberg/New York, pp 721–735 Heppenstall AJ, Harland K, Smith DM, Birkin MH (2011) Creating realistic synthetic populations at varying spatial scales: a comparative critique of population synthesis techniques. Geocomputation 2011 conference proceedings. UCL, London, pp 1–8 Huang Q, Yang C, Li W, Wu H, Xie J, Cao Y (2011) Geoinformation computing platforms. Ch. 3. In: Yang R et al (eds) Advanced geoinformation science. CRC Press, Baton Rouge, pp 79–126, op. cit Kulldorf M (1997) A spatial scan statistic. Commun Stat: Theory Method 26:1481–1496 Openshaw S (1987) A mark 1 geographical analysis machine for the automated analysis of point data sets. Int J Geogr Inf Syst 1:335–358 Openshaw S, Abrahart RJ (eds) (2000) Geocomputation. Taylor and Francis, London. Wikipedia: conway’s game of life. Available from http://en.wikipedia.org/wiki/Conway%27s_Game_of_ Life Wolfram S (1983) Statistical mechanics of cellular automata. Rev Mod Phys 55:601–643

Bayesian Spatial Analysis Chris Brunsdon

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Kinds of Spatial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Bayesian Approaches for Point Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 A Roughness-Based Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Bayesian Approaches for Point-Based Measurement Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Region-Based Measurement Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1630 1631 1632 1636 1641 1645 1647 1648

Abstract

This chapter outlines the key ideas of Bayesian spatial data analysis, together with some practical examples. An introduction to the general ideas of Bayesian inference is given, and in particular the key role of MCMC approaches is emphasized. Following this, techniques are discussed for three key types of spatial data: point data, point-based measurement data, and area data. For each of these, examples of appropriate kinds of spatial data are considered and examples of their use are also provided. The chapter concludes with a discussion of the advantages that Bayesian spatial analysis has to offer as well as considering some of the challenges that this relatively new approach is faced with. Keywords

Posterior distribution · Markov chain Monte Carlo · Prior distribution · Bayesian inference · Spatial data

C. Brunsdon (*) National Centre for Geocomputation, National University of Ireland Maynooth, Maynooth, Ireland e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_66

1629

1630

1

C. Brunsdon

Introduction

Bayesian analysis has seen an enormous increase in popularity over recent years. There are a number of possible reasons for this. Firstly, a framework in which prior beliefs may be incorporated can offer certain advantages. Particularly, one can examine a new study in the light of findings of previous studies – something which cannot be done in a classical framework. Secondly, inferences drawn are based on posterior distributions for parameters of interest. Some find this a more intuitive basis for inference than classical significance tests and confidence intervals. Indeed, one author considering the teaching of elementary statistics observed that some of his students misinterpreted classical inference in a way that coincided with Bayesian inference (Berry 1997). Thirdly, a probability distribution for a parameter contains more information than a classical confidence interval. For example, a bimodal posterior or a highly skewed posterior would offer a more subtle interpretation of the outcome of a study than a simple interval or point estimate. To recall the basic ideas of Bayesian inference, an individual supplies information prior beliefs and some unobservable parameter (or set of parameters) θ in the form of a probability distribution or probability density function f(θ). If the observable data x has a likelihood function L(x|θ) then Bayes theorem can be used to obtain the relationship f ðθjxÞ / f ðθÞLðxjθÞ

(1)

thus giving an expression for the probability distribution of the unobservable θ given the observable data x. This is a very different framework to the classical or frequentist approach to statistical inference above θ. Although the latter makes use of L(x|θ), θ itself is treated as a deterministic, but unobservable, quantity. As such, f(θ) and f(θ|x) are not considered as relevant concepts, and instead, hypotheses as to whether statements about θ are true are used as the basis for inference. The need to supply f(θ) in Bayesian inference is a notable qualitative distinction between the two approaches, as it requires the analyst to supply a subjective set of beliefs about θ as part of the analysis process – although these beliefs could represent a state of impartiality (e.g., by supplying a uniform distribution for θ). In this situation the prior distribution is often referred to as a noninformative prior. One practical difficulty with this approach is that Eq. (1) only defines the posterior distribution up to a constant of proportionality. To normalize the distribution (so that the integral over all θ values is one), the equation f ðθjxÞ ¼ Ð

f ðθÞLðxjθÞ f ðθÞLðxjθÞdθ

(2)

is used. However, the integral in the denominator of Eq. (2) is not always analytically soluble, which in the past has led to some difficulties with carrying out Bayesian analysis in practice. Fortunately, recent advances in computational techniques have

Bayesian Spatial Analysis

1631

addressed this issue. In particular, the advent of Markov Chain Monte Carlo (MCMC) simulation-based approaches has allowed Bayesian approaches to be applied in a wide variety of situations where analytical results are intractable. The ideas of Bayesian inference, in general, and of MCMC methods are covered in Sect. 9, “Spatial Econometrics” in this major reference work. The aim of this chapter is to focus on how these ideas can be applied to the analysis of spatial data. In many ways there is nothing special about Bayesian techniques for spatial data – essentially the principles underlying the inferential process (stemming from Bayes’ theorem) are the same as those used for any kind of data. However, although Bayesian analysis of spatial data may have the same overarching framework, there are distinct characteristics of the kinds of model to which it is applied and also to the kinds of data structure one is likely to encounter. Thus, in this chapter, models in which the random components are spatially correlated and the ways in which Bayesian inference is made about spatial characteristics of the modeled processes will be considered. Firstly, the kinds of spatial data with which Bayesian methods are commonly used will be outlined. Next, examples of how Bayesian methods are applied for each type of data will be given. Finally some practical considerations will be considered.

2

Kinds of Spatial Data

Elsewhere in this book, a comprehensive list of spatial data types is provided. However, not all of these data types may be analyzed in a Bayesian framework using techniques that are well established at the time of writing. In particular, there are few Bayesian methods that may be applied to arc- or line-based data. Following the typology of Fischer and Wang (2011), some kinds of data for which wellestablished Bayesian approaches exist are now listed: Point data: These are data consisting of a set two or three dimensional point coordinates in Euclidean space. The points themselves are considered as random and typically interest lies in modeling a stochastic spatial point process that could have generated the data. Point measurement data: These are data consisting of a set of spatial points, as before, but here each point has an attached attribute. Typically the attribute is some kind of measurement, such as a temperature taken at that location or the price of a house sold at that location. Here, in general it is only the measured attribute that is assumed to be a random component of the model, the locations being treated as fixed, controlled values – effectively part of the design of the data collection procedure. Typically, the spatial component of models for this kind of data arises from an assumption that correlations between the attributes is in some way dependent on the relative locations of the points. Field data: These are data that relate to variables which are conceptually continuous (the field view) and whose observations have been sampled at an pre-defined and fixed set of point locations. Arguably, those have a great deal in common with point measurement data – although those are defined only at a fixed set of points,

1632

C. Brunsdon

while field data relate to a sample of point measurements of a mapping from all points in space to an attribute value. In this case, a measurement (such as temperature) could have been taken at any point in space. Area data: These are data consisting of a set of spatial regions – typically represented as polygons. In most studies, the polygons provide a partition of a geographical study area, that is, their union completely covers the study area, and no pair of polygons intersects, except in some cases on their boundaries. In the latter situation, regions with boundaries touching are said to be adjacent. As with point-based measurement data, an attribute is associated with each entity – the entities now being regions instead of points. Also in common with point-based measurement data, the regions are considered to be fixed (not arising from a random process), and the only component assumed random in models of these data is the attribute. In this case, the spatial aspect of the model is achieved by relating the correlation structure of the random attributes to the relative position of regions – in particular, they often make use of the adjacency or otherwise of each pair of regions in the data set. Spatial interaction data: These data (also termed origin-destination flow or link data) consist of measurements or counts, each of which is associated with a pair of point locations, or pair of areas. For example, travel to work data, listing the number of people traveling from a given origin (home) zone to a given destination (workplace) zone, fall into this category. In this chapter, attention will be focused on point data, point measurement data, and area data – as in Cressie (1993), although the technique suggested for point measurement data may also be applied to field data.

3

Bayesian Approaches for Point Data

A key model for point data is the spatial Poisson process. In such a process, location of points occurs independently of one another, with the intensity of occurrences in location s = (s1, s2) being given by Λ(s). For any area A within the region of study, the number of occurrences has a Poisson distribution with mean ð sA

ΛðsÞds:

(3)

In such models, it is informative to estimate Λ(s) since mapping this function allows regions of high intensity to be mapped, and related features, such as areas of high variability in intensity, or locations of peak intensity can be investigated. One very simple method of estimation is to use a pixel- or regular lattice-based approach. Suppose the study area is partitioned into a number of small, identical tesselating polygons (typically squares or regular hexagons), Δk for k  {1, . . . , n}. If λk is the mean value of Λ(s) for s  Δk, then

Bayesian Spatial Analysis

1633

λk ¼

1 A

ð s  Δk

ΛðsÞds

(4)

where A is the area of each Δk. If the value of A is reasonably low, so that each Δk occupies a very small proportion of the study area, it is reasonable to assume that Λ(s) does not vary much within each Δk, so that the lattice based approximation ΛðsÞ ¼ λk for s  Δk

(5)

may be used. Now, if ck is the number of points occurring in Δk, then ck has a Poisson distribution with mean Aλk. For simplicity, assume for now that spatial units are chosen such that A = 1. This simplifies equations, without any loss of generality. With this assumption in place, ck has a Poisson distribution with mean λk. At this stage, a Bayesian approach may be applied to estimate λk, since the Poisson distribution of ck gives Prðck jλk Þ ¼ expðλk Þλckk

(6)

and so if the prior probability density function for λk is f(λk), then the posterior distribution f(λk | ck) is related to the prior and the likelihood function by f ðλk jck Þ / f ðλk Þ expðλk Þλckk :

(7)

In particular, if the prior for λk is a gamma distribution, proportional to λα1 k exp ðβλk Þ for λk > 0, and zero otherwise, and where the constant of proportionality is independent of λk, then we have f ðλk jck Þ / expðð1 þ βÞλk Þλckk þα1 if λk > 0, and zero otherwise

(8)

which is also a gamma distribution for λk, with updated parameters β0 = β + 1 and α0 = α + ck. In particular, in the case where α = β = 0, we have f ðλk Þ / λ1 k . This is an example of an improper prior distribution – although it is not a well-defined probability function itself, the corresponding posterior distribution is well defined, being a gamma distribution with parameters α = ck and β = 1. In this case, the expected value of the posterior distribution for λk is just ck. This value may be used to provide a point estimate of λk for each area Δk. As an example, consider the inventory data of the Zurichberg Forest, Switzerland (see Mandallaz (2008) for details), which lists the locations (and types) of trees in the forest. These data are provided with the kind authorization of the Forest Service of the Canton of Zurich. Figure 1 shows the raw tree location data and the estimated values of Λ(s) using the above approach, with the Δk zones being elements of a hexagonal tesselating grid. One characteristic of this kind of estimate is that although it is only a minor generalization of the raw data (it simply estimates the intensity in Δk as being proportional to the observed count of events in that area, ck), the underlying

1634

C. Brunsdon

Fig. 1 Raw trees data from the Zurichberg Forest (RHS) and estimates of Λ(s) based on a hexagonal grid and noninformative priors for the finite elements λk (LHS)

Bayesian theory also provides a posterior distribution, so that one could, for example, estimate other features of the posterior distribution for λk such as the upper 95th posterior percentile for each λk, or the summed intensity over some arbitrary region of the forest. In either case, posterior distributional features could be estimated (possibly via simulation) in addition to point estimates. As an example, consider the problem of estimating the average intensity over the entire forest. Assuming there are n Δk elements covering the forest, this quantity is 1 nA

ð sF

ΛðsÞds where F ¼

[

k¼1,..., n

Δk :

(9)

If length units are chosen such that A = 1 as before, this is equal to 1 X λk n k¼1,..., n

(10)

which is the mean of all of the λk values. For brevity, this will now be denoted as λ. If each λk has a posterior distribution as set out in Eq. (8), then it can be seen that      n c1 f λfck , k ¼ i, . . . , ng / λ exp n1 λ if λ > 0, and zero otherwise (11)

where c is the arithmetic mean of the ck’s. Thus the posterior distribution of λ is also gamma, with parameters n c and n1. Thus, the posterior distribution of λ is shown in Fig. 2. To obtain an interval estimate of λ , the a % Highest Density Region (HDR) (Hyndman 1996) can be found. This is the region in the set of possible values of λ

Bayesian Spatial Analysis

1635

Posterior Probability Density

35 30 25 20 15 10 5

99% Highest Density Region

0 0.76

0.78

0.80

0.82

λ

Fig. 2 Posterior probability distribution of mean intensity of trees in forest. 99% HDR is also indicated

such that the posterior probability that λ lies in the region is a/100, and the posterior probability density of any point within the region is higher than any point outside of the region. For the Zurichberg Forest data, the 99% HDR is the interval (0.757,0.814). This is indicated on Fig. 2. The above example used a prior probability density distribution that assumed independence between the individual discretized intensity levels λk. This assumption is manifested in certain characteristics of the intensity estimates. In Fig. 1 the estimates show quite a large amount of spatial “roughness,” that is, there are a number of Δk zones whose mean intensity estimates λk are very different in value from their neighbors. However, there may be reason to expect that in fact these values should vary smoothly. If this is the case, Bayesian inference can be a useful tool since such expectations of smoothness can be expressed via the prior distribution. An important issue is now to define roughness. In terms of any function G(s), rather than the discrete approximation, one possible measurement of roughness is the expression ð RðGÞ ¼

 2 2 @ G @2G þ ds 2 @s22 s  F @s1

(12)

where s = (s1, s2). Note that the two partial second derivatives in Eq. (12) measure rate of change of slope and that if both of these have high positive or high negative values, this indicates sharp maxima or minima – hence, squaring the sum of these and integrating over the study area is a measure of the propensity of G(s) to have sharp peaks and pits and is therefore a plausible measure of roughness. Note also that R(G) = 0 if and only if

1636

C. Brunsdon

@2G @2G þ 2 ¼0 @s21 @s2

(13)

that is, if G(s) is a solution of Laplace’s equation – frequently referred to as a harmonic function. Harmonic functions exhibit the mean value property GðsÞ ¼

1 2πr

ð t  Pðs,rÞ

GðtÞdt

(14)

if G is a harmonic function, and P(s,r) denotes a circular path of radius r centered on the location s, provided that all of P(s,r) lies strictly within the region in which G(s) is defined. In less mathematical terms, this states that the value of a harmonic function at a given point is equal to the mean value of that function taken over a circular path centered on that point. Again, this can be thought of as a condition of smoothness. Returning to Eq. (12), R(G) can be thought of as a measurement of the discrepancy between G and a harmonic function, and, that considered from this viewpoint, this provides an alternative interpretation of this quantity as a measure of roughness.

4

A Roughness-Based Prior

In the previous section, a measurement of roughness was proposed. In this section, this measure will be used to construct a Bayesian prior encapsulating subjective expectations of smoothness in Λ(s) which will be combined with observed point data to obtain a posterior distribution for intensity. To do this, the roughness of some function G that is related to Λ may be used to construct a prior probability density function for the value of R(G) with an exponential form suggested: f ðRðΛÞÞ / κ expðκRðGÞÞ

(15)

defined for positive R(G). The mean of this prior distribution is κ1 so that low values of κ suggest that a high degree of roughness is expected. pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Here, G will be related to Λ by GðsÞ ¼ ΛðsÞ þ 3=8. This is chosen since the transformation gives a distribution close to normal and has a variance-stabilizing effect, when the counts in the zones Δk have a Poisson distribution (Anscombe 1948; Mäkitalo and Foi 2011). In practice, as in the previous example, a finite element approach is used to estimate G(s). In particular, it can be shown that R(G) can be approximated by R ¼ const

X k¼1,n

1 X gk  g 6 i¼1,6 k ðiÞ

!2 (16)

Bayesian Spatial Analysis

1637

Fig. 3 Diagram showing notation for numerical values of neighbors of zone Δk (the central hexagonal zone)

gk(5)

gk(4)

gk(6)

gk

gk(1)

gk(3)

gk(2)

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where gk ðiÞ ¼ λk ðiÞ þ 3=8 indicates the mean value G associated with each of the neighboring tesselating zones to Δk as set out in Fig. 3. Provided δ is reasonably small, these mean values {gk, gk(1), gk(2), gk(3), gk(4), gk(5), gk(6)} are close to the sampled values of G at points annotated on the figure. To see how this estimate is derived, firstly assume that each of the lines emanating from the central gk has a length δ. To estimate @ 2 G=@s21 and @ 2 G=@s22 at the point (s1, s2), a quadratic surface of the form Qðs1 þ t1 , s2 þ t2 Þ  γ 0 þ γ 1 t1 þ γ 2 t2 þ γ 11 t21 þ γ 12 t1 t2 þ γ 22 t22

(17)

is fitted to the values gk and gk(1), . . . , gk(6) at the (t1, t2) locations corresponding to the central point and six hexagonally arranged lines in Fig. 3, respectively, using least squares approximation. The locations of the seven points are  ð0,0Þ,

pffiffiffi   pffiffiffi   pffiffiffi  pffiffiffi  3 3 3 1 3 1 1 1 δ,  δ , ð0,  δÞ,  δ,  δ ,  δ, δ , ð0,δÞ, δ, δ 2 2 2 2 2 2 2 2

(18) and the corresponding Q values are {gk, gk(1), gk(2), gk(3), gk(4), gk(5), gk(6)}. Using this approximation and applying second partial derivatives, we obtain the approximations @ 2 G=@s21  γ 11 and @ 2 G=@s22  γ 22 , so that @2G @2G þ 2  γ 11 þ γ 22 @s21 @s2

(19)

1638

C. Brunsdon

It may be checked that, when the values for the point locations and intensities are substituted into the formula for the least squares fitting, we have γ 11 þ γ 22

1 X ¼ 2 g  gk 6 i¼1,6 k ðiÞ

! (20)

and note also, from the Taylor expansion of G(s + t), that this approximation tends asymptotically to the true value as δ tends to zero. Finally, it is interesting to note that this result can be used to derive a discrete version of the mean value property, with the circular path around a point being approximated by the centroids of the neighboring hexagonal lattice elements. Now, returning to the definition of roughness in Eq. (12), the approximation in Eq. (16) may be obtained. Entering this expression for R (as an approximation for R) into Eq. (15) yields a roughness-penalty-based prior distribution for each gk : 0 !2 1

X X 1 Pr g k jg k ð1Þ , . . . , gk ð6Þ / κ exp@κ gk  g k ðiÞ A: 6 i¼1,6 k¼1,n

(21)

It may be noted that this distribution takes the form of an intrinsic conditional autoregressive (ICAR) model (Besag 1974; Besag and Kooperberg 1995) – with precision parameter κ. This is an improper prior, as it does not have a well-defined multivariate distribution for the vector g; however, in conjunction with certain likelihood functions (e.g., multivariate normal), the posterior probability density is well defined. Thus, a prior distribution for the gk values is constructed, and therefore, one for the λk values can be derived. However, this prior distribution requires the parameter κ to be provided. One possibility – if the analyst has a clear idea of the degree of roughness to expect – is to specify a particular value in advance. However, in many situations one cannot realistically do this, and another approach – demonstrated here – is to specify a hyperprior for the quantity. In this case, Eq. (21) is assumed to be conditioned on κ as well, with the prior for κ being an improper prior equal to a small constant. Having constructed this prior distribution for the parameters, it is now necessary to consider the posterior distribution. Since we are working with the transformed pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi parameters gk ¼ λk þ 3=8 , the counts in each zone Δk will also be transformed using thepsame function, that is, ck, the count of trees in each Δk, will be transformed ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi to c0k ¼ ck þ 3=8 . Anscombe (1948) suggests that Poisson counts from a distribution with mean λk transformed in this way have an approximately normal distribution with mean gk and variance 1/4. Making use of this approximation, a degree of algebraic manipulation shows that the conditional posterior distributions for the gk’s and κ are then given by

Bayesian Spatial Analysis

1639

 1 X c0k 1 Pr gk jgk ð1Þ , . . . , gk ð6Þ ,κ / N g þ , pffiffiffiffiffiffiffiffiffiffiffi i¼1,...,6 k ðiÞ 24 κ κþ4 

2  P P 1 Prðκjg 1 , . . . , gn Þ / Gamma n, k¼1,...,n gk  6 i¼1,6 g k ðiÞ





(22)

where N ð:Þ and gamma denote the normal and gamma distributions with the standard parameterizations. This specification of posterior probabilities lends itself to a Markov chain Monte Carlo (MCMC) approach. Rather than estimate the parameters of interest analytically, this approach draws simulated samples from the posterior distribution of the parameters of interest. Essentially, if estimates of all parameters except one are provided, the remaining one is then drawn from one of the conditional distributions shown in Eq. (22). Cycling through the parameter space, provided each parameter is drawn from the correct conditional distribution, a draw of all parameters from the full posterior distribution is obtained. If a large number of such multivariate draws are provided, then the empirical distribution may be used to estimate features of this posterior probability distribution. In this case, an MCMC approach is used to estimate and map the posterior mean of each λk, which is achieved by transforming the simulated distribution of the gk’s, via the inverse Anscombe transform λk ¼ g2k  3=8. The results are shown in Fig. 4. A further advantage of the MCMC approach is that estimation of quantities related to the λk values may be estimated in a very natural way, by simply computing Fig. 4 Posterior mean estimates of trees in forest, based on implementing the MCMC algorithm set out in Eq. (22)

1640

C. Brunsdon

these quantities for the simulated posterior λk value and viewing the resulting distribution of the computed quantities. For example, a quantity of interest may be the slope of Λ(s). This is defined to be the magnitude of the vector ∇Λ(s). As before, a discrete approximation will be used – and this may be obtained from the coefficients of the quadratic expression Q in Eq. (17). In fact @Λ @Λ  γ 1 and  γ2 @s1 @s2

(23)

so that sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2  2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffi @Λ @Λ  γ 21 þ γ 22 þ j∇ΛðsÞj ¼ @s1 @s2

(24)

It may be checked that by using a least squares estimation of Q, "

γ1

#

γ2

2 pffiffiffi 3 pffiffiffi pffiffiffi pffiffiffi 3 3 3 3 λ2 þ λ3  λ5  λ6 7 16 2 2 2 6 2 7 ¼ 6 7 34 1 1 1 1 5 λ1 þ λ2  λ3  λ4  λ5 þ λ6 2 2 2 2

(25)

which may be used to obtain an estimate of the slope. Applying this to the MCMC simulations obtained earlier yields the map in Fig. 5. This shows the regions in which the tree density undergoes most change. As a final approach, one could designate any Fig. 5 Posterior mean estimates of slope of Λ(s)

Bayesian Spatial Analysis

1641

Fig. 6 Posterior mean estimates of Tk (left) and associated standard deviations (right – see text)

area where the slope exceeds 1.5 to be a transition zone. The idea of this is to identify boundary regions around the forest, for example, these could equate to ecotones (Holland and Risser 1991; Allen and Breshears 1998). Again, a function is applied to the simulated λk values to obtain a binary variable Tk, which takes the value one if λk exceeds 1.5, and zero otherwise. By counting the number of times (in 1,000 simulations) that Tk is equal to one, an estimate of the posterior probability that each zone Δk is part of a transitional area may be estimated. In addition, the posterior standard deviations are also illustrated (Fig. 6).

5

Bayesian Approaches for Point-Based Measurement Data

In this section, analysis of a set of data points will be considered, with each point having an attached attribute. In this case, the spatial point locations will not be treated as random, for example, they may be points where a set of measurements were taken, such as soil conductivity or rainfall levels or the rental or sale prices of houses at a given set of addresses. The attributes will, however, be treated as a random spatial process. Typically, this will be done by specifying a variogram or correlation function specifying the degree of correlation between the attributes associated with a pair of points. See the chapter ▶ “Multivariate Spatial Process Models” by Gelfand in this handbook. For a stationary process, this correlation will depend only on the distance between the point pairs – so that measurements associated with any pair of points separated by a given distance will have the same correlation, regardless of their absolute spatial location. Analyzing data using a model of this kind, particularly with a view to interpolating measured values, is referred to as kriging – based on the early work in this area developed by Matheron; see Matheron (1970, 1973) for example. The underlying idea here is that the attribute values are observations from a field defined as a real-valued random function over a space S. The randomness can occur in two ways. Firstly, the function itself is modeled as a Gaussian random process,

1642

C. Brunsdon

with a value E(z) = m(s), where E(z) is the expected value of the attribute value z and m(.) is a function of location s  S. For ordinary kriging this function is just a constant value, μ. For a fuller discussion, see Cressie (1993). The covariance between two values at locations s1 and s2 is given by σ 2ρ(s1, s2), where ρ(s1, s2) is the correlation between the z-values at s1 and s2 – and if the process depends only on the distance between the points, as stated above, then the ρ(.) function can be written in the form ρ(|s1  s2|), or ρ(d) if d = |s1  s2|. The relationship is often expressed in the form of a variogram: γ ð d Þ ¼ σ 2 ð 1  ρð d Þ Þ

(26)

A number of possible functional forms for γ are frequently used. Some following examples include   d Exponential γ ðd Þ ¼ σ 2 1  exp  h   3d d3  30 if d < h; σ 2 otherwise Spherical γ ðd Þ ¼ σ 2 (27) 2h 2h   d2 Gaussian γ ðd Þ ¼ σ 2 1  exp  2 2h In each of these cases, the parameter h controls the amount of correlation between attribute values at locations separated by a given distance. Although the exact interpretation differs for each of the functions, it is generally the case that larger values of h suggest correlation persists at larger distances. Note that the list above is not exhaustive, but also that arbitrary specification of these functions is not possible – in general, functions must be chosen such that for any set of points in S, the covariance matrix implied by the function must be positive definite. When analyzing this kind of data, there are generally two key issues requiring investigation. Firstly, the calibration of the variogram function, in the particular forms above, implies the estimation of the parameters h and σ 2. The second issue is the use of this model for interpolation. Although the Gaussian process is sampled at a finite number of locations, it is often useful to estimate values of E(z) at other locations. In particular, given that there is a vector of observed attribute values x, interpolation at a new point s can be thought of as estimating the conditional mean of z at this location, given x – written as E(z|x). In both cases, although the primary aim may be the investigation of parameters, being able to make statements about the precision or accuracy of these statements is of importance. The original approach to kriging calibrates the variogram using point estimates for the parameters, and then “plugs them in” to the expression for E(z|x). However, more recently, attention has focused on approaches that allow for uncertainty in the variogram parameter estimates. In terms of interpolation, these provide a more realistic picture of kriging estimates – although the original approaches provide expressions for variance in E(z|x), these are conditioned on the parameter values in the variogram. In reality these

Bayesian Spatial Analysis

1643

Fig. 7 Locations and asking prices for the Liverpool house price data

Liverpool House Price Data

405 Price £100k £150k £200k £250k

Northing (km)

400

395

390

385

325

330

335

340

345

350

Easting (km)

parameters are generally estimated and subject to uncertainty due to the sampling process – and this in turn adds uncertainty in the estimates for E(z|x). A Bayesian approach – such as that given by Diggle et al. (1998) – is one way of addressing this issue. Prior probability distributions are supplied for the variogram parameters, and on the basis of observations, a posterior distribution function is derived. In the reference above, as in the previous section, an approach using Markov chain Monte Carlo simulations is used. The technique is illustrated here with a set of house price data obtained from Nestoria (http://www.nestoria.co.uk) – a web site providing listings of houses currently for sale. Here a set of 260 semidetached three bedroom houses were downloaded for the Liverpool area of the United Kingdom. This set contains all three bedroomed semidetached houses listed on 28th December 2011, excluding those whose price was not published in the listings. (A very small number of houses are listed as “price on application”.) Prices were scaled to units of thousands of pounds, to avoid rounding errors in calculations; thus, an asking price of $199,950 would be recorded as $199.95 k. Here, house locations are recorded as latitude and longitude and then transformed to the OS National Grid projection coordinates in meters. A plot of the data is given in Fig. 7. From this, it may be seen that there is a degree of spatial clustering in this data, for example, a group of higher-priced houses are visible to the north, and another group of lower-priced houses may be seen to the southern part of the area. Here, an exponential variogram model is used – with a uniform prior on [0, 30] km for h and a reciprocal prior for σ 2. Calibration is via the MCMC approach, as outlined in the previous section. This is achieved using the geoR package for the R statistical programming language. As an example of the variogram calibration aspects of the analysis, the posterior distribution curve for the parameter h is

1644

C. Brunsdon

0.04

Posterior Density

0.03

0.02

0.01

0.00 0

5

10

15 h (km)

20

25

30

Fig. 8 Posterior distribution (via MCMC) of h in the house price variogram

shown in Fig. 8. Note that since geoR uses discrete approximations for distributions, this takes the form of a histogram. This shows that the posterior distribution for h peaks at around 4 km, although it has quite a long tail. In a classical inference approach, this kind of insight into inference about h is rarely provided, as in general it is just a two-number confidence interval that is provided. In Fig. 9 the correlation derived from the variogram associated with each of the possible values for h is shown. For each h, the correlation curve is drawn with an intensity corresponding to the posterior marginal probability of that value of h. Reading off vertically from the x-axis suggests posterior probabilities of correlation associated with a given value of h. From this it can be seen that there is very little correlation between pairs of observations separated by more than about 20 km. Moving on from inference about the variogram, as stated above, it is also possible to make inferences about the values of E(z|x) where there are no observations. In this case, the technique is used to evaluate such values at regular points on a grid, so that “house price surfaces” can be drawn, outlining regional trends in house prices around the city of Liverpool. Again, the geoR package in R (Ribeiro and Diggle 2001) is used to achieve this. Results are shown in Fig. 10. As these are also the results of MCMC simulation, it is possible to visualize the accuracy of the estimates, by showing a corresponding surface of the standard deviations of the posterior distributions of the estimates of E(z|x) at each location on the grid. These are

Bayesian Spatial Analysis

1645

1.0

Semivariance

0.8 0.6 0.4 0.2 0.0 0

10

20

30

40

50

60

Distance (km)

Fig. 9 Posterior estimates of the correlation curve Fig. 10 Posterior predicted mean house price values based on Bayesian kriging

Kriging of House Prices

Price £1000's

410 300

Northing

405

400

250

395

200

390

150

385 100 380 320

325

330

335

340

345

350

Easting

illustrated in Fig. 11. Note that in this figure, lower values of the posterior standard deviation (and hence greater confidence in the estimation) occur at locations close to the data points. To see this, compare Figs. 7 and 11.

6

Region-Based Measurement Data

In this section, data associated with regions (such as states or counties that may be represented by polygons) will be discussed. This section will be somewhat briefer than the previous two – a key reason for this is that the approach seen in Sect. 3 is

1646

C. Brunsdon

Fig. 11 Posterior predicted standard deviations of house price values based on Bayesian kriging

Posterior Std £1000's

Kriging of House Prices 410

60 405 50

Northing

400

395

40

390

30

385

20

380 320

325

330

335

340

345

350

Easting

quite similar to those that may be applied here. This is because, for point data, a discrete grid approximation was used, and this is essentially applying Bayesian approaches to region-based data, providing the “regions” from a regular lattice. A number of models have been developed to describe multivariate distributions where each variable is a quantity associated with a region. For example, if zi is associated with region i, then the simultaneous autoregressive model (SAR) is zi ¼ μi þ

n X

  bij zj  μj þ ei

(28)

j¼1

  where ei  N 0, σ 2i independently, and μi = E(zi) and bij are constants, with bii = 0. Frequently, if wij is an indicator variable stating whether regions i and j are adjacent (with the convention that wii = 0), then the b-values are modeled as bij = ρswij and the values of μi and ρs are parameters to be estimated. Note that here the s is not intended as an indexing subscript; it simply denotes that this parameter is associated with the SAR model. In the simplest case for μi, we may assume it takes a constant value μ and also that σ 2i takes the constant value σ 2 – so that all that is needed is to estimate the three values μ, σ 2, and ρs. A related model is the conditional autoregressive (CAR) model which specifies the distribution of zi conditionally on all of the other z-observations, denoted by zi: zi jzi  N μi þ

n X



 cij zj  μj , τ2i

! (29)

j¼1

where, similar to before, cii = 0. Again, a common form of model for cij is cij = ρcwij. Assuming that the μi’s and τ2i ’s are fixed for all regions, again we have a model with

Bayesian Spatial Analysis

1647

just three parameters, μ, τ2, and ρc. In each of these cases, Bayesian analysis can be carried out in a fairly intuitive way. For both the CAR and SAR models, it is possible to rearrange the model equations to the form of multivariate distributions of z, the column vector of zi observations. For the SAR model, we have  h iT  z  N μ1n , σ 2 ðIn  ρs WÞ1 ðIn  ρs WÞ1

(30)

where 1 n is a column vector of n 1’s, In is the n  n identity matrix, and W is the matrix of wij values. Similarly the factorization theorem – see Besag (1974), for example – gives for the CAR model:

z  N μ1n , τ2 ðIn  ρc WÞ1

(31)

Using similar prior distributions to those from Sect. 3 combine with either Eq. (29) or Eq. (28) to provide the likelihood functions enables a multivariate posterior distribution for σ 2, ρs (or ρc), and μ to be derived and simulated. With this in mind, it is relatively straightforward to simulate posterior distributions for either of the sets of three distributions or for some priors to derive the distributions theoretically.

7

Conclusions

The above discussion demonstrates that it is possible to apply Bayesian inference to a number of spatial problems. Here, problems have been broadly classified in terms of the form of geographical information used as the basis of the analysis: point-based data, point-based measurement data, and region-based measurement data. In each case some form of Bayesian analysis is proposed. However, the list here is by no means exhaustive. For example, it may be possible to apply kriging and variogram-based methods (Oliver 2010) to regional data either by assigning a centroid point to each region or – perhaps more realistically – by expressing regional values as the average or sum of points defined on a random field defined within that region. Also it is possible to extend these models – whereas in this chapter the expected value of the observed measurements has been modeled as a constant value, it is possible to adopt a regression approach and to express this quantity as a function of explanatory variables. Bayesian analysis provides a useful inferential tool for calibrating all of these models; they provide a relatively rich set of tools for drawing inference relating to model parameters and particularly when using MCMC tools, for drawing inferences relating to functions of these parameters, or predictive distributions for future observed variables. There are other forms of spatial data, and these may also be usefully analyzed using Bayesian techniques. For example, linear or network data have not been considered here, but such data forms may also be usefully analyzed using models which lend themselves to Bayesian analysis.

1648

C. Brunsdon

However, perhaps a greater challenge for spatial analysis is the ability to assess which model for a given data set is the most appropriate or whether a particular model is entirely inappropriate. The use of inferential tools as stated above is applied within a framework where the “true” model is a member of a family of models, for example, when analyzing regional data although the specific values of μ, ρc, and σ 2 may not be assumed, it is taken as given that the model can be specified in the general framework of a CAR model. However, how might one decide whether this is more appropriate than a SAR model? This is not a simple matter of testing “nested” models, when one set of models is a subset of another and inference can be based on posterior distributions of the parameters in the larger subset, but a matter of comparing structurally distinct models. Indeed, how does one choose between either of these models and a kriging-based model as suggested earlier. A number of ideas have been proposed for this in a Bayesian framework, but this area of research is relatively new; the deviance information criterion (DIC) of Spiegelhalter et al. (2002) is one attempt to address this – although some debate has been raised – see, for example, Ando (2007) who discusses a tendency for the DIC to favor models with larger numbers of parameters and proposes a modification to address this. Another issue related to this is that of model appropriateness. Some recent criticism has been aimed at CAR and SAR based models – Wall (2004) notes that although the W matrix suggests that there is some spatial structure in the model, the actual variance-covariance matrices in Eqs. (30) and (31) can provide counterintuitive relationships between distances between regions and correlations, stating that . . . although these covariances are clearly just functions of B or C, in general there is no obvious intuitive connection between them and the resulting spatial correlations.

(Here B and C are matrices of bij and cij coefficients). Although the Bayesian approach allows us to make relative assessments of the most appropriate parameter values within the modeling framework, it is also important to determine whether any combination of parameters would be meaningful. These observations are highly significant; a great deal of work (Bayesian and otherwise) has used models of this kind and has in general accepted that they have encapsulated spatial dependency in a reasonable way. In summary then, it is argued that Bayesian analysis has a great deal to offer to spatial analysis and provides a richness of inferential tools, particularly via MCMC, that allow insights to be made that may not otherwise be possible or at least achieved as easily or intuitively. However, there are still a number of challenges, although arguably many of those – such as the issues of model appropriateness or model selection – are problems that face all kinds of statistical inference.

References Allen CD, Breshears DD (1998) Drought-induced shift of a forest–woodland ecotone: rapid landscape response to climate variation. Proc Natl Acad Sci 95(25):14839–14842

Bayesian Spatial Analysis

1649

Ando T (2007) Bayesian predictive information criterion for the evaluation of hierarchical Bayesian and empirical Bayes models. Biometrika 94(2):443–458 Anscombe FJ (1948) The transformation of Poisson, binomial and negative-binomial data. Biometrika 35(3–4):246–254 Berry D (1997) Teaching elementary Bayesian statistics with real applications in science. Am Stat 51(3):241–246 Besag J (1974) Spatial interaction and the statistical analysis of lattice systems (with discussion). J R Stat Soc B 36(2):192–236 Besag J, Kooperberg C (1995) On conditional and intrinsic autoregression. Biometrika 82 (4):733–746 Cressie N (1993) Statistics for spatial data, rev edn. Wiley, New York Diggle PJ, Tawn J, Moyeed R (1998) Model-based geostatistics. Appl Stat 47(3):299–350 Fischer MM, Wang J (2011) Spatial data analysis. Models, methods and techniques. Springer, Berlin/Heidelberg/New York Holland M, Risser PG (1991) Ecotones: the role of landscape boundaries in the management and restoration of changing environments. Chapman and Hall, New York Hyndman RJ (1996) Computing and graphing highest density regions. Am Stat 50(2):120–126 Mäkitalo M, Foi A (2011) A closed-form approximation of the exact unbiased inverse of the Anscombe variance-stabilizing transformation. IEEE Trans Image Process 20(9):2697–2698 Mandallaz D (2008) Sampling techniques for forest inventories. Chapman and Hall/CRC, Boca Raton Matheron G (1970) The theory of regionalised variables and its applications. Les Cahiers du Centre de Morphologie Mathématique De Fontainebleau Matheron G (1973) The intrinsic random functions and their applications. Adv Appl Probab 5 (3):439–468 Oliver MA (2010) The variogram and kriging. In: Fischer MM, Getis A (eds) Handbook of applied spatial analysis. Springer, Berlin/Heidelberg/New York, pp 319–352 Ribeiro P Jr, Diggle P (2001) geoR: a package for geostatistical analysis. R-NEWS 1(2):15–18 Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A (2002) Bayesian measures of model complexity and fit (with discussion). J R Stat Soc Ser B Stat Methodol 64(4):583–639 Wall MM (2004) A close look at the spatial correlation structure implied by the CAR and SAR models. J Stat Plan Inference 121(2):311–324

Geovisualization Ross Maciejewski

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Statistical Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Box Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Scatterplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Parallel Coordinate Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Choropleth Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Class Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Exploratory Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Exploring Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Animation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Space-Time Cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1652 1653 1654 1655 1656 1657 1658 1660 1661 1662 1666 1666 1667 1668 1669

Abstract

The current ubiquity of data collection is providing unprecedented opportunities for knowledge discovery and extraction. Data sources can be large, complex, heterogeneous, structured, and unstructured. In order to explore such data and exploit opportunities within the data deluge, tools and techniques are being developed to help data users generate hypotheses, explore data trends, and ultimately develop insights and formulate narratives with their data. These tools often rely on visual representations of the data coupled with interactive computer interfaces to aid the exploration and analysis process. Such representations fall under the purview of visualization, in which scientists have worked on

R. Maciejewski (*) Arizona State University, Tempe, AZ, USA e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_70

1651

1652

R. Maciejewski

systematically exploiting the human visual system as a key part of data analysis. Research in this area has been inspired by a number of historical sources, examples include physicist James Maxwell’s sculpture of a thermodynamic surface in 1874, Leonardo da Vinci’s hand-drawn illustration of water from his studies to determine the processes underlying water flow, or the flow map of Napoleon’s March on Moscow produced by Charles Minard in 1869. Each of these examples attempts to explain data in a visual manner, and, as visualization has progressed, principles and practices have been adopted to standardize representations, and, more importantly, better exploit properties of the human visual system. Keywords

Visual representation · Human visual system · Exploratory data analysis · Color scheme · Spatiotemporal data

1

Introduction

The current ubiquity of data collection is providing unprecedented opportunities for knowledge discovery and extraction. Data sources can be large, complex, heterogeneous, structured, and unstructured. In order to explore such data and exploit opportunities within the data deluge, tools and techniques are being developed to help data users generate hypotheses, explore data trends and ultimately develop insights and formulate narratives with their data. These tools often rely on visual representations of the data coupled with interactive computer interfaces to aid the exploration and analysis process. Such representations fall under the purview of visualization, in which scientists have worked on systematically exploiting the human visual system as a key part of data analysis. Research in this area has been inspired by a number of historical sources, examples include physicist James Maxwell’s sculpture of a thermodynamic surface in 1874, Leonardo da Vinci’s hand-drawn illustration of water from his studies to determine the processes underlying water flow, or the flow map of Napoleon’s March on Moscow produced by Charles Minard in 1869. Each of these examples attempts to explain data in a visual manner, and, as visualization has progressed, principles and practices have been adopted to standardize representations, and, more importantly, better exploit properties of the human visual system. In this chapter, we will focus on how visualization research has effectively utilized one of the most ubiquitous visual representations, the cartographic map. Cartography is the study and practice of making maps and is situated as perhaps the most well-studied visualization technique available to scientists. For centuries, cartographers have looked at combining science, aesthetics, and analysis into the mapmaking process on the premise that such tools will be able to effectively communicate information and aid in knowledge generation. One of the most famous examples of such knowledge generation is John Snow’s mapping of the location of

Geovisualization

1653

cholera deaths in Soho, England, in 1854. By plotting cholera deaths and the locations of water pumps, Snow was able to develop a hypothesis about the waterborne nature of the disease. Snow was able to use his visual explorations of the cholera outbreak patterns to persuade the local town council to disable the water pump central to the disease center. This sort of data analysis utilizing cartographic principles as a means of representing spatiotemporal data and exploring patterns within this data is often referred to as geographic visualization or geovisualization. Geovisualization focuses on visually representing spatiotemporal data, exploiting known cartographic techniques as part of the interactive graphical representation of the data, and incorporating dynamic interactions for querying and exploring data. A rigorous definition of geovisualization can be found in MacEachren (1994): “Geographic visualization (can be defined) as the use of concrete visual representations – whether on paper or through computer displays or other media – to make spatial contexts and problems visible, so as to engage the most powerful human information-processing abilities, those associated with vision.” Note that the above definition discusses the use of visual representations (plural) of the data. Representations can range from complex glyphs, shaded areas, lines, and diagrams, and given the amount of data and differences in ways to represent data, it is important to consider what sort of questions will be asked of the data. Thus, rather than trying to make one “best” map or data representation which depicts only a subset of the available information, geovisualization systems often incorporate a variety of data views in an attempt to generate more insight for analysts. Often, the views generated involve the computation of basic statistics and summaries of the data as a means of providing the user with an overview of their data prior to, or as part of, the exploration process.

2

Statistical Graphics

As perhaps a precursor or overlapping field with data visualization, much of the groundwork in describing and exploring data sets comes from the statistical graphics community. Statistical graphics are tools used to reveal details about data sets, such as outliers and trends within the data. By revealing such features, one can speculate on how data can be further processed, transformed, explored, and analyzed. Hypothesis generation would begin in this stage, and such hypothesis generation would lead to the use of certain methods for hypothesis testing and data modeling. Finally, proper statistical methods could then be chosen for a further refined analysis of the data. What is of key importance in data visualization is that when a statistical graphic is made, the information from the data set being explored is encoded by the chosen display method. An analyst will look at the visual representation, and then a decoding process will occur. During this decoding process, visual perception becomes the vital link. No matter how impressive the encoding is, if the visual decoding cannot be done by the analyst, then the data analysis will fail. This

R. Maciejewski

8 6 4 2

10 8

10 8 Visitors

Dollars (Million)

# of Players

1654

6 4 2

2.3 2.5 2.7 2.9 3.1 3.3 3.5 3.7 3.9 4.1 4.3 Batting Average

6 4 2

1

2 3 Movie Distributor

1

2

3

4

5

6

7

Temperature (⬚C)

Fig. 1 Common statistical graphics. (Left) Histogram. (Center) Box plot. (Right) Scatterplot

encoding and decoding of graphical elements is explored in The Elements of Graphing Data (Cleveland 1985), and detailed perceptual studies that discuss how well humans are able to perceive encodings (relative angles, line lengths, etc.) are described. Much use has been made of these studies in further developing visualizations, and a variety of methods are discussed in Visualizing Data (Cleveland 1993). Three of the most common exploratory data analysis graphics used for exploring the distribution of data include the box plot, the histogram, and the scatterplot (Fig. 1).

2.1

Histograms

One of the first things explored when analyzing data is the distribution of the given data measurements. According to Wilkinson (2005), the histogram (Fig. 1, left) is one of the most widely used visual representation and first-look analysis tool. Introduced by Pearson (1895), the histogram provides a visual summary of a univariate sample within a data set. The visual summary consists of rectangles drawn over discrete intervals (called classes or bins) where the height of each rectangle corresponds to the number of data samples that would fall into a given bin. The main concern in creating a histogram lies within the choice of the number of bins and the width of the bins. Different numbers of bins and different bin widths can each reveal different insights into the data. Thus, the initial choice of the bin number and size can have a dramatic impact on the knowledge that can be derived when using a histogram visualization. Most statistical graphics programs default to one of two options when creating a histogram: the square root choice or Sturges’ choice (Sturges 1926). The square root choice is defined such that given a data set with n samples, the number of bins k will be calculated as follows: k¼

pffiffiffi n :

(1)

Once the number of bins are known, it is assumed that each bin will be of equal width. Thus, the bin width h is defined as follows:

Geovisualization

1655



 maxðxÞ  minðxÞ h¼ k

(2)

where x is the univariate data set under analysis. For comparison, Sturges’ choice (Sturges 1926) is defined as follows: k ¼ ½log2 ðnÞ þ 1

(3)

Both the square root choice and Sturges’ choice have an implicit assumption of a normally distributed data set. However, in the construction of the approximation, Sturges considered an idealized frequency in which the distribution will approach the shape of a normal distribution. Thus, if the data is not normal, the number of bins chosen will need to be revised. However, for moderately sized n (approximately n < 200), Sturges’ rule will produce reasonable histograms. For further details on other histogram bin choices, please refer to Wilkinson (2005). While a vast amount of research has been done on bin selection, the key factor to take away is that there are strengths and weaknesses behind bin choices. If small values of h are used (with respect to the data in question), the histogram will model fine details within the data; however, noise in the data set can obstruct the information being presented and may obfuscate the analysis. Conversely, with large values of h, the density becomes too smoothed and one can oversimplify the description of the data.

2.2

Box Plots

While a histogram allows us to explore the distribution of a univariate measurement within our data set, an analyst often wants to quickly determine the mean values of the data distribution and compare this distribution to others. One measure that is key to visualizing data distributions is the quantile. The h quantile, q( f ), is defined as the value along the data measurement scale where approximately a fraction f of the data are less than or equal to q( f ). In the case of quartiles, this would be where approximately one fourth of the data is less than q( f ) for the lower quartile, half for the second quartile (or the median), and three fourth for the upper quartile. The benefit of such measures is that the f-values provide a standard for comparisons across distributions. One graphical way of using quantiles to compare distributions is the Box plot. Box plots, Fig. 1 (middle), consist of several distinct graphical elements, namely, the box and the whiskers. The box itself represents the range from the lower quartile to the upper quartile of the data, with the bottom edge of the box being the lower quantile and the top edge of the box being the upper quantile. The dot in the middle of the box represents the sample median. Note that if the dot is not directly in the center of the box, then it is an indication of data skewness. The whiskers are the dashed lines that extend above and below the box. These lines represent the extent of the remaining data samples, and, assuming no outliers,

1656

R. Maciejewski

the maximum value of the distribution is represented by the top line attached to the whisker, and the minimum value of the distribution is represented by the bottom line. If outliers exist, a small circle at the top or bottom of the plot will be used to represent this fact. A data sample is considered to be an outlier if its value is more than 1.5 times the interquartile range away from the top or bottom of the box. While a box plot may seem simpler (or even less intuitive) than a histogram, its main advantage is its limited screen space requirements. A larger number of box plots can be plotted on the screen at once for comparison than would be possible for entire histograms. Furthermore, the visual representation of a histogram is highly influenced by the choice of bin width; there is no such limitation on the box plot.

2.3

Scatterplots

While the box plot allows users to compare and summarize distributions of like groups, analysts often wish to search for correlations between variables within their data set. One common means of visually representing the relationship between two variables within a data set is the scatterplot. Scatterplots visualize multidimensional data sets by assigning two of the data dimensions to a graphical axis, Fig. 1 (right). Points of the data set are then rendered in the Cartesian space defined by the axis. These plots are typically employed as a means of analyzing bivariate relationship within a planar projection of the data and are used to help researchers understand the potential underlying correlations between variables (Tufte 1983). These visualizations provide a quick means of assessing data distributions, clusters, outliers, and correlations. Given a scatterplot, an analyst will visually assess the relationship between the variables being plotted by looking for trends in the plot. If the points tend to approximate a line running from the lower left to the upper right, this is often indicative of a positive correlation between variables. Likewise, if the points tend to approximate a line running from the upper left to the lower right, this can be indicative of a negative correlation. Such plots can also be enhanced by fitting a line to the data to help visualize such linear correlations. However, (Cleveland 1993) notes though that putting a smooth curve through the data in the mind’s eye is not a good method for assessing nonlinearity and can bias the analyst. While scatterplots are good at assessing linear correlations, they can also be effective at implying higher-order correlations as well. Points in the plot may visually approximate other shapes, such as exponential or logarithmic curves. These early insights into the data can provide an excellent starting point for analysts and further generate new hypotheses about the relationships between variables within their data. While scatterplots allow one to assess the bivariate relationship of data, data sources being analyzed today are typically multivariate. Common extensions to the scatterplot include encoding another variable to color the points on the plot or encoding a variable to the size of the points. However, even more commonly, a matrix of all possible variable combinations may be drawn such that each column of the matrix contains the same x-axis and each row contains the same y-axis for a

Geovisualization

1657

Fig. 2 A parallel coordinate plot representing six data attributes

given scatterplot. Such a matrix is called a scatterplot matrix and is useful for visualizing how a data set is distributed through multiple variables.

2.4

Parallel Coordinate Plots

While scatterplots can provide an overview of relationships between multivariate data dimensions, they are limited in terms of screen space and the fact that they only present two variable relationships at a time. Even by extending scatterplots to scatterplot matrices, the amount of screen space needed to represent the variables can still be inadequate for showing all possible relationships within a data set. In order to overcome such limitations for multidimensional data explorations, the parallel coordinate plot was introduced by (Inselberg 1985). Parallel coordinate plots are comprised of a series of univariate data axes, with each axis representing some measurement within a data set. These axes are drawn parallel to each other as shown in Fig. 2. Each element of the data set is then represented as a line running through its corresponding axis value. As more variable attributes are added, more axes are added and more connections are made. In interpreting the parallel coordinate plots, the idea is that the user will recognize correlation values simply by the shape the plots make. If lines between two axes are close to parallel, they would have a correlation coefficient approximately equal to one. If lines between two axes intersect one another in the middle of the graph, they tend to have a high negative correlation. If lines cross each other at different angles between two axes, the correlation coefficient between these two variables is approximately zero.

1658

R. Maciejewski

It is crucial to note that the ordering of the coordinate axes will play a major role in the analysis phase. In fact, effective dimensional ordering is a key component of many visualization techniques (e.g., star glyphs, pixel-oriented techniques), and a good ordering of the data can enhance the overall analysis process (Ankerst et al. 1998). Furthermore, the scaling of each axis and spacing also plays a major role in the analyst’s ability to extract information from a parallel coordinate plot. Thus, while parallel coordinate plots are capable of representing more variables than a traditional scatterplot, more consideration needs to be taken when creating the visual representation.

3

Choropleth Maps

As discussed in the introduction, visualization is the process of creating interactive visual representations of data in order to generate knowledge about a particular process or phenomena found within the data. While any pictorial representation of the data may have the potential to generate knowledge, the question being asked of the data can directly influence what type of visualization could be most effective. In the previous sections, common statistical graphics were explored, each providing an overview of data distributions. However, in the previously explored visualizations, spatial relationships inherent within the data were ignored. In geovisualization, often the first question being asked of the data is “compare location x to location y.” Such a question can easily be answered by plotting data elements on the map, and the user can visually explore such data for patterns. Data is often in the form of geographically reference latitude/longitude pairs, or aggregated into geographical areas on a map. Figure 3 illustrates various potential geographical visualizations of spatial data on a map. In Fig. 3a, locations of criminal offenses are plotted with respect to a centralized location denoted by the pin glyph and the semitransparent circle. In Fig. 3b, an aggregation of a pandemic influenza simulation is visualized, where each county is colored by the number of simulated ill patients. In Fig. 3c, the estimated probability distribution of a criminal offense occurring is plotted, where the darker color represents a higher probability. Each of these representations is able to answer different questions about the data, and the underlying choice of an appropriate representation is crucial in constructing an effective geographic visualization. In order to construct an effective geographic visualization, designers must be aware that choices of glyph shape, size, and color can have a direct impact on how the information presented is perceived. Such choices are important not only in geographical representations, but in all visual representations of the data. In The Semiology of Graphics (Bertin 1967) and The Grammar of Graphics (Wilkinson 2005), the mapping of quantitative attributes (e.g., counts, rates, and other measures) to aesthetics (e.g., color, size, shape) is described in great detail. Of key importance, though, is how these mappings are perceived and interpreted by those using the visualization. The motivation behind this is that the graphics being displayed on the screen are the precise elements of which we are questioning. Thus, when choosing

Geovisualization

1659

Fig. 3 Sample geographical data visualizations. (a) Plotting data as symbols/glyphs. (b) Aggregating data by county in a choropleth map. (c) Abstracting and estimating data through density estimation

1660

R. Maciejewski

what visual aesthetic will be used, the designer first needs to consider what sort of data is going to be mapped to the aesthetic and how this may interact with other items being rendered. With regard to geovisualization, this interplay between cognition and perception for understanding how maps work is succinctly presented in How Maps Work (MacEachren 1995). In this work, MacEachren describes the ways in which the representational choices inherent in mapping interact with our information processing and knowledge construction to form insights into the data. As such, proper design choices based on data attributes is of critical importance in creating a successful geovisualization. In order to illustrate the importance of such design choices, one can consider perhaps the most common type of thematic map, the choropleth map (see Fig. 3b). A choropleth map is a map in which areas are shaded or textured/patterned based on the measurement of a statistical variable in that region (e.g., income, population density). Such maps are based on data aggregated over previously defined regions (e.g., counties, states, countries) and are often used as a first-pass tool for exploring spatial statistics, for example, in exploring the data in Fig. 3b one may ask “where are the highest rates of illness?” However, questions such as that can be gleaned simply from a table of numbers, with the choropleth map, a user may now begin comparing rates spatially across regions. In order to effectively compare rates, the colors chosen to render the map need to be interpreted, and much research in both the cartographic and visualization domain has been done on what constitutes an effective color mapping for data values.

3.1

Color

The choice of the color scale is a complicated design choice that depends primarily on the data type, domain problem, and chosen visual representation. Typically, one will choose to constrain the color mapping to a univariate scale. While data is becoming increasingly multivariate, for example, the data set may have many different statistical measures within a county, many users often have difficulty interpreting bivariate or higher-dimensional color schemes. For a univariate color map, we can constrain ourselves to three types of color schemes (Harrower and Brewer 2003): sequential, divergent, and qualitative (Fig. 4). Each color scheme has its own strengths in regard to what type of data it can best represent. For data containing some sequential order (e.g., rates going from low to high), as the name suggests, a sequential color scheme is the most appropriate. In a sequential scheme, the color is mapped and ordered such that light colors (typically) represent lower values, and dark colors represent higher values. The divisions between each color band are chosen to be perceptually differentiable (Harrower and Brewer 2003) and to also give the impression of increasing and decreasing values to the user. The divergent color scheme functions in much the same way as the sequential color scheme. The key difference is the use of a comparison point or a zero point from which the data is being explored. In divergent scales, a key value of interest is mapped to the middle region of white, and values above and below this value are

Geovisualization

1661

Fig. 4 Examples of univariate color maps for data visualization

mapped along sequential color scales. Careful choices must be taken when choosing the high- and low-end representations for the scale. Often this is done with the concept of “cool” colors and “warm” colors as defined by Hardin and Maffi (1997), where red and yellow colors are considered warm and blues are considered cool. The qualitative color scheme is unique in that the bands of this scheme are not perceptually ordered. Instead, the bands of color are chosen to be perceptually different and unorderable. This mapping would be used to map distinct data classes to a color, for example, counties that skew towards one political party would be one quantitative band, and those that skew to a different party could be another.

3.2

Class Intervals

While the choice of color scheme is critical in creating an appropriate rendering, the way the map is colored is based on a classification of the distribution of the variable being visualized. This classification is analogous to the previous discussion of creating histograms. Here, the number of colors being used is analogous to choosing the number of bins. First, the data over the entire geographical space can be analyzed, providing an overview of the data distribution. This distribution is then transformed into a histogram, where each bin of the resultant histogram will map to a particular color. Unfortunately, choosing the number of bins to represent the data is even more critical of a task when generating a choropleth map. If too many bins are chosen, analysts will be unable to distinguish the different color values mapped to different regions and may be distracted from seeing trends within their data. If too few bins are chosen, the overall trends will be over-smoothed, and patterns can become hidden within this smoothing. Common methods for classification include quantile, equal interval, and standard deviation classifications; details and comparisons of methods can be found in (Monmonier 1972). Once the number of colors is chosen (a good description of color maps and classifications is presented in Harrower and Brewer (2003)), then each geographical unit is colored based on its given statistical value. Along with the complexity of choosing categories for choropleth map colors, it is important to understand that size plays a dominant role in our perception. Geographic areas (e.g., counties, zip codes) vary in their shape and area (compare Alaska to Delaware), and a choropleth map is designed such that the map colors provide an equal representation to all areas within the map. Unfortunately, the different sizes of geographical areas can play perceptual tricks on the analyst, hiding changes in the data, or draw attention to unimportant areas of the map. Furthermore, when aggregating data, small areas (like major cities) may overwhelm the data of larger regions (like states). Such aggregation problems

1662

R. Maciejewski

give rise to ecological fallacies as choropleth maps provide analysts with ample opportunity to make inferences on their data based on the aggregate region.

4

Exploratory Data Analysis

Until this point, the discussion has centered around generating statistical graphics and relatively simple choropleth maps. Each of these techniques has strengths and weaknesses; however, one can think of these techniques as individually being powerful tools for exploring data. The idea of using these tools as a means of investigating data was termed exploratory data analysis by Tukey (1977). Tukey compared the exploration of data to that of detective work in which a detective investigating a crime would need the tools necessary to analyze the crime scene (e.g., a finger printing kit) as well as an understanding of how crime works. He noted that data analysts need ways to look at their data, and whether these techniques are graphical, arithmetic, or in-between, the simpler they are made, the better they will be at conveying information to the user. Thus, we can think of the previously discussed visualization methods as a set of tools that can be used in creating geographical visualization systems. Rather than trying to make the best map that can show all of our variables, it is perhaps best to expand from the notion of a single view. Instead, we should realize that data can be represented in a variety of ways, and given the highly multivariate nature of data being collected, a single map or statistical graphic may not be enough. Rather than trying to make one best view of the data, interactive graphics systems can provide multiple representations of the data. Furthermore, these representations can be programmatically linked or coordinated. Figure 5 illustrates the concept of coordinating views. In this figure, an analyst is exploring criminal incident report data through a variety of displays. These coordinated multiple views (North and Shneiderman 2000) allow an analyst to create several displays of the data, typically involving statistical graphics and/or geographical maps. Each display provides details on data entities, or aggregations of the data, and is linked through a variety of brushing and selection methods. It is through these interactions with the data that analysts are able to derive even more insight into their data. First, initial data views are typically constructed as a means of describing the data to the analyst, thus providing an overview of the data. An analyst then explores the data, generating and investigating hypotheses in a process that is often referred to as the Visual Information Seeking Mantra coined by Shneiderman (1996): “Overview first, zoom and filter, then details on demand.” Perhaps the most basic interaction techniques are those of scrolling and panning. Given a large data set, it is possible that some data will be rendered outside of the screen space. In order to explore this data, the user will need to move the display by scrolling or panning in order to bring other data elements into view. These basic interaction techniques spatially separate the current focus from the larger information space by providing a sliding virtual window. However, scrolling/panning

Fig. 5 The visual analytics law enforcement technology toolkit. Here, the user is exploring criminal incident reports using coordinated multiple views. The topright window (b) shows an interactive menu that allows users to filter through multiple offenses/offense hierarchy/agencies. The top window (c) shows the time series and the left window (d) shows the calendar views of the selected incident report data. The right window (e) shows a clock view that provides an hourly temporal view of the data. The bottom-left window (f) shows the time slider with radio buttons that allow different temporal aggregation levels

Geovisualization 1663

1664

R. Maciejewski

interactions introduce a cognitive burden upon the user where the analyst must now keep track of the context of their data. Similarly, zooming techniques temporally separate focus regions from the information context, inducing a virtual hierarchical ordering at various zoom levels. Zooming also places a significant cognitive burden on the user to assimilate focus into the overall information space as the various zoom levels may not provide any explicit contextual cues. This causes problems such as “desert fog” (Jul and Furnas 1998) in which a view of the data information provides no details on which to base further navigations within the data set. In order to reduce the cognitive load, overview + detail and focus + context techniques were introduced. Overview + detail techniques attempt to reduce the cognitive burden associated with exploratory navigation (i.e., panning and zooming) by providing simultaneous synchronized views of overview and detail of the information space. An example would be the thumbnail views provided in Adobe Reader and Microsoft PowerPoint. Focus + context techniques embed the focus region within the larger information space using a transition function in order to overcome issues with cognitive reorientation between overview and detail. Fisheye lenses (Furnas 1986), implemented using a distortion-based transition function, are a commonly used design choice for focus + context techniques. Evaluations of focus + context techniques, however, suggest that they are not always beneficial (Gutwin and Skopik 2003). Object targeting becomes difficult due to “hunting effects” of the fisheye that occurs as a result of magnification. Fisheye distortions have also been found to interfere with a user’s spatial comprehension as well as with other tasks involving location recall and visual scanning. While such interactions are useful for exploring details within the data, perhaps the most critical interaction is that of brushing. Given a statistical graphic, the user can interactively select (brush) data elements, for example, histogram bars, points on a scatterplot, and lines on a parallel coordinate plot. If several different views are linked to the same data source, this brushing will highlight these elements across all views. Monmonier (1989) further expanded this notion of integrating brushing (particularly scatterplot brushing) with maps, calling this a geographic brush. By selecting areas of the map, points in the scatterplot would be highlighted, and vice versa. Examples of brushing in coordinated multiple views are shown in Fig. 6. By employing interaction techniques such as brushing, linking, and drill-down operations, users can now explore areas on the graphic and retrieve the exact data values. Furthermore the use of interactive techniques for exploring data graphics has several advantages over traditional static graphics. The first is that such interactivity will allow for greater precision. For example, when analyzing a static map, colors represent a range of value; the addition of interactivity allows one to query the exact value of a region. Second, this interaction not only adds precision but also provides users with a quicker means of data retrieval. Attention is no longer split between looking at the graphic and looking at the legend or scale; instead, users can click regions to retrieve data values. Such techniques all fall under the umbrella of exploratory data analysis, and statisticians within geography have further expanded on these ideas, terming them exploratory spatial data analysis. A review of exploratory spatial data analysis can be found in Anselin (1998) and Andrienko and

Fig. 6 Brushing in coordinated multiple views. (Left) exploring hospital patient records using linked time series plots. (Right) exploring criminal incident reports using histograms

Geovisualization 1665

1666

R. Maciejewski

Andrienko (2006). For further information see Symanzik, chapter ▶ “Exploratory Spatial Data Analysis”, and the chapter ▶ “Web-Based Tools for Exploratory Spatial Data Analysis” by Linda See in this handbook. Exploratory spatial data analysis utilizes the same tools and interaction techniques listed above; however, the focus is primarily on spatial or spatiotemporal data. As such, exploratory spatial data analysis systems link tools such as scatterplots, histograms, and others to interactive maps. The links between these statistical graphics are through brushing and highlighting, where users can interactively select a portion of the data in one view and see these elements highlighted in other views. In this way, users can begin developing and exploring hypotheses about their data. Such tools are then linked to analytic algorithms that can provide deeper insight into data correlations and statistics. These methods focus explicitly on the spatial aspects of the data (spatial dependence, association, and heterogeneity). The goal is to discover spatial patterns and relationships within the data by combining a variety of exploratory data analysis techniques with spatial analysis algorithms and geographic information system tools. Currently, a variety of exploratory spatial data analysis tools are available online, and details of these systems can be found in Handbook of Applied Spatial Analysis (Fischer and Getis 2010).

5

Exploring Time

The discussion up to this point has focused on visualizations for summarizing data and exploring spatial components of the data. However, we do not want to constrain geovisualization to only spatial data, as often data being collected geographically contains information about rates, movements, and changes over space and time. Instead, we want to create systems and visualizations that are able to answer questions not only about where but also when, and by combining representations that can answer these sorts of questions, we can begin generating insight into how or why something is occurring. In fact, many questions asked about temporal data have to do with the change throughout time. These patterns are formed by the combination of four characteristics (Few 2009): the magnitude of the change, the shape of the change, the velocity of the change, and the direction of the change. A recent review of temporal visualization methods can be found in Aigner et al. (2008); however, this section will focus on two techniques primarily used for spatiotemporal data (as opposed to strictly temporal data).

5.1

Animation

One of the most common ways of displaying spatiotemporal data is through the use of animation. Since time moves in a linear fashion, one can animate graphics to show the movement of trends over time. Unfortunately, the animation of choropleth maps brings with it a series of new challenges. In previous sections, a discussion on class interval selection was provided, and the choice of class interval looks at data for only one given time interval (or aggregate thereof). However, when the statistics are

Geovisualization

1667

allowed to change temporally, the choice of class intervals becomes increasingly challenging. Now, the class choice must work not only across one map, but across multiple maps, and the choice of class interval can potentially emphasize relatively small fluctuations due to temporally global choices in class selection. While such issues complicate the creation of an animated choropleth map, many systems allow for either looped playback animation or user-controlled exploration through an interactive time slider.

5.2

Space-Time Cube

While animation provides an obvious way to display spatiotemporal data, it also introduces cognitive burdens onto the user similar as the user now must retain information of the last state of the data visualization and compare it to the current state. One way of removing such a burden would be to display both space and time concurrently in a visualization. At the end of the 1960s, Hägerstrand (1970) introduced a geographical technique called the space-time cube. This technique utilized a three-dimensional diagram (or space-time cube) to visualize spatiotemporal data. The space-time cube consists of a two-dimensional geographical space and a third dimension of time. In this way, spatiotemporal data can be visualized showing movement patterns and trends in a single graphical representation. Figure 7 is a space-time cube visualization of disease rates, showing clusters over space and time. Like animation, space-time cubes introduce their own cognitive burdens. When rendering in three dimensions, data occlusion can occur, and rotation, panning, and zooming becomes necessary, thus creating a larger cognitive load. Fig. 7 An example spacetime cube. Disease rates are mapped over space and time to explore how spread vectors may occur and what clusters are potentially of interest

1668

6

R. Maciejewski

Conclusion

Currently, data is being generated and collected at unprecedented rates, particularly georeferenced data. Cell phone locations, georeferenced tweets, street cameras, and others all provide analysts with new streams of data containing locations about incidents that may or may not be of interest. Furthermore, with the development and growing popularity of geobrowsers, more and more individuals can now readily map their own georeferenced data. For example, Google Maps provides an API (application programming interface) in which users can quickly plot and browse their georeferenced data in a sophisticated and intuitive user interface. Such tools provide opportunities to explore data from bike traffic routes to crime patterns to individualized restaurant preferences. In exploring current research in data visualization, it becomes clear that geographical visualization is playing a larger and larger role as a central piece in data analysis and exploration. Geovisualization is not purely about creating maps of data. It is about generating tools and techniques for visually representing spatial and spatiotemporal data and facilitating the exploration of this data for hypothesis generation and exploration. The next step in geovisualization is to expand from hypothesis generation and exploration and begin incorporating tools for data modeling and hypothesis testing. Currently analysis tools (e.g., Anselin et al. 2006) are incorporating both interactive graphics and advanced statistical analysis algorithms for data exploration and analysis. Given the ability of the human visual system to recognize patterns within data sets, it is imperative to allow analysts to explore and interact with their data. In the problems given here, data analysis was presented as more of a searching and hypothesis generation problem; however, it is often the case that data sets have been generated with specific domain questions in mind. By tailoring our analysis, visualizations, and interactions to these domains, it is possible for an analyst to gain more insight into their data than would have been possible with static graphics or traditional tools. Furthermore, this chapter has discussed only the most basic components of visual analysis. Hypothesis testing algorithms and data mining tools exist that allow analysts to readily sift through their data; these techniques are being coupled with novel graphical displays in an attempt to generate large amounts of insight into data. The details presented here form only the beginnings of geographical visualization. Current research explores geometrical modeling of cartographic techniques such as flow maps. Notations, streamlines, and other visual cues have been added to space-time cube visualizations as a means of enhancing information. Threedimensional density maps can be easily generated and explored using interactive techniques, and combinations of various textures, glyphs, symbols, and colors are being explored as a way to represent larger amounts of data to a user in a single display. These visuals are used not only in the analysis and exploration process but also as a means of explaining data trends and narratives to others. With geographical visualization, the general populace understands a map, and by linking their data to these sorts of displays, they become invested in the analysis. However,

Geovisualization

1669

care must be taken when choosing our graphical representations. It is easy to lie with statistics, and as shown here, statistics is one of the foundations of data visualization. As such, we should strive for creating reliable and correct representations of data, while providing analysts with interactive means of exploring and ultimately analyzing their data.

References Aigner W, Miksch S, Muller W, Schumann H, Tominski C (2008) Visual methods for analyzing time-oriented data. IEEE Trans Vis Comput Graph 14(1):47–60 Andrienko N, Andrienko G (2006) Exploratory analysis of spatial and temporal data: a systematic approach. Springer, Berlin Ankerst M, Berchtold B, Keim DA (1998) Similarity clustering of dimensions for an enhanced visualization of multidimensional data. In: IEEE symposium on information visualization, Phoenix, pp 52–62 Anselin L (1998) Exploratory spatial data analysis in a geocomputational environment. In: Longley PA, Brooks SM, McDonnell R, MacMillan B (eds) Geocomputation: a primer. Wiley, Chichester, pp 77–94 Anselin L, Syabri I, Kho Y (2006) GeoDa: an introduction to spatial data analysis. Geogr Anal 38(1):5–22 Bertin J (1967) The semiology of graphics. ESRI Press, Redlands Cleveland WS (1985) The elements of graphing data. Wadsworth, Monterey Cleveland WS (1993) Visualizing data. Hobart Press, Summit Few S (2009) Now you see it: simple visualization techniques for quantitative analysis. Analytics Press, Oakland Fischer MM, Getis A (eds) (2010) Handbook of applied spatial analysis. Software tools, methods and applications. Springer, Berlin/Heidelberg/New York Furnas G (1986) Generalized fisheye views. In: Proceedings of ACM CHI, Boston, pp 16–23 Gutwin C, Skopik A (2003) Fisheyes are good for large steering tasks. In: Proceedings of the SIGCHI conference on human factors in computing systems, Ft. Lauderdale, pp 201–208 Hägerstrand T (1970) What about people in regional science? Pap Reg Sci 24(1):6–21 Hardin C, Maffi L (1997) Color categories in thought and language. Cambridge University Press, Cambridge Harrower MA, Brewer CA (2003) ColorBrewer.org: an online tool for selecting color schemes for maps. Cartogr J 40(1):27–37 Inselberg A (1985) The plane with parallel coordinates. Vis Comput 1(4):69–91 Jul S, Furnas GW (1998) Critical zones in desert fog: aids to multiscale navigation. In: Proceedings of the 11th annual ACM symposium on user interface software and technology, San Francisco, pp 97–106 MacEachren AM (1994) Visualization in modern cartography: setting the agenda. In: MacEachren AM, Taylor DRF (eds) Visualization in modern cartography. Pergamon, Oxford, pp 1–12 MacEachren AM (1995) How maps work. Guilford, New York Monmonier MS (1972) Contiguity-biased class-interval selection: a method for simplifying patterns on statistical maps. Geogr Rev 62(2):203–228 Monmonier M (1989) Geographic brushing: enhancing exploratory analysis of the scatterplot matrix. Geogr Anal 21(1):81–84 North C, Shneiderman B (2000) Snap-together visualization: evaluating coordination usage and construction. Int J Hum Comput Stud 51:715–739 Pearson K (1895) Contributions to the mathematical theory of evolution. II. Skew variation in homogenous material. Philos Trans R Soc A Math Phys Eng Sci 186:326–343

1670

R. Maciejewski

Shneiderman B (1996) The eyes have it: a task by data type taxonomy for information visualizations. In: Proceedings of the IEEE symposium on visual languages, Las Alamos, pp 336–343 Sturges HA (1926) The choice of a class interval. J Am Stat Assoc 21(153):65–66 Tufte ER (1983) The visual display of quantitative information. Graphics Press, Cheshire Tukey JW (1977) Exploratory data analysis. Addison-Wesley, Reading Wilkinson L (2005) The grammar of graphics. Springer, New York

Web-Based Tools for Exploratory Spatial Data Analysis Linda See

Contents 1 2 3 4

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exploratory Spatial Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exploratory Spatial Temporal Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Evolution of Tools for Exploratory Data Analysis/Exploratory Spatial Temporal Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Early Desktop Applications of ESDA/ESTDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Coupling or Extending GIS with ESDA/ESTDA Functionality . . . . . . . . . . . . . . . . . . . . 4.3 Developments in R Related to ESDA/ESTDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Early Web Developments in ESDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 The Move Toward Web-Based Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Open Source Software and Open Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Movement Toward a Service Oriented Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 The Emergence of Commercial Web Mapping Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Key Developments in Python and PySal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Web-Based Tools for ESDA/ESTDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Web Mapping Platforms and Tools with Little ESDA/ESTDA Functionality . . . . . . 7.2 Online Versions of GIS Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Building Customized Web Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 GeoDa-Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 CyberGIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1672 1673 1676 1678 1678 1679 1679 1680 1681 1681 1682 1683 1683 1684 1685 1685 1685 1686 1686 1687 1687 1688

Abstract

This chapter addresses the topic of web-based tools for Exploratory Spatial Data Analysis (ESDA). Early research into ESDA and the development of several standalone applications took place during a time when the internet was still a L. See (*) Ecosystem Services and Management Program, International Institute for Applied Systems Analysis (IIASA), Laxenburg, Austria e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_115

1671

1672

L. See

largely static environment, referred to as Web 1.0. It also took place at a time when Geographic Information Systems (GIS) were lacking in ESDA capabilities, thereby filling a much-needed gap. With the emergence of interactive web capabilities (Web 2.0), web mapping and web GIS have since become mainstream. This has opened up new possibilities for creating web-based tools for ESDA. The chapter starts with a brief overview of ESDA (including the temporal dimension) and the historical evolution of software applications developed in this field. This is followed by early web developments in ESDA and the movements that resulted in the emergence of web mapping and web GIS. Finally, we present different web-based tools available for ESDA. Keywords

Spatial data · Exploratory spatial data analysis · Spatial-temporal analysis · Pattern detection · Outliers · Induction · Local statistics · Web-based tools · GIS mapping

1

Introduction

Innovations in technology have radically changed the field of geography over a relatively short period of time. To appreciate this change, we need to go back in time, two to three decades, and look to academics such as Stan Openshaw, who at that time talked frequently about the ‘geographic data explosion’. Interestingly, this was well before the buzzword ‘big data’ emerged and before big data became a reality. At the time, Stan was using clustering techniques on UK census data in his early work on geodemographics, i.e., in the development of clusters of socio-economic characteristics from the census and other sources. At the time geodemographics was used to understand the composition of neighborhoods before becoming a commercially valuable technology. He worked with matrices in the order of 150,000 small census areas by 50 to 100 variables. This was the ‘big data’ of the 1990s. He also developed cluster hunting machines, i.e., different versions of what he called the Geographical Analysis Machine (GAM), to find cancer clusters in a spatial mass of cancer incidence data. GAM even outperformed other state-of-the-art software algorithms at that time, winning a competition to find hidden cancer clusters. Stan Openshaw was one of the early pioneers in the growing field of exploratory spatial data analysis (ESDA) i.e., research related to discovering patterns in geographical data. Another key player was Luc Anselin at Arizona State University, who developed SpaceStat in the early 1990s. SpaceStat then evolved into GeoDa, first released in 2002, which is a more visual ESDA tool with mapping functionality and linkage between different visual components. Around the same time period, others were also developing standalone ESDA tools, e.g., GeoVISTA Studio by Mark Gahegan’s group at Penn State University, the cartographic data visualizer (cdv) by Jason Dykes and the Space Time Analysis of Regional Systems (STARS) package by Sergio Rey; see Anselin (2012) for a comprehensive overview.

Web-Based Tools for Exploratory Spatial Data Analysis

1673

Coming back to the present day, what has changed since this early work on ESDA? First, big spatial data (Schintler and Fischer 2020) have now become a reality as well as the technology to support them in the form of cloud computing. Secondly, there have been advances in the internet through Web 2.0, which provided the ability to build interactive web applications. Users could now create their own maps, upload data and run code, but this also led to the emergence of social media. Thirdly, mobile phones, with geolocation capability through Global Positioning System (GPS) receivers and high-resolution cameras, have become ubiquitous. Mobile phone call data records are an example of spatial big data that are being used in innovative ways, e.g. in tracking malaria and mapping poverty, but mobile phones users are also contributing vast amounts of geotagged information, often as part of social media sites. Thus, not only has the amount of data vastly increased but also the type of spatial data such as the unstructured and heterogeneous nature of social media, and geotagged photographs. How has ESDA kept up with this “new” type information? This chapter is intended to provide an overview of the developments that have taken place in the field of ESDA, with an emphasis on the web-based tools that are now available for spatial data exploration. The starting point is a brief introduction to ESDA (including the temporal dimension as ESTDA), followed by a historical overview of software tools that emerged from the mid-1990s onwards as the internet was still evolving. The early web-based developments in ESDA/ESTDA are then presented followed by the events that led to web-based mapping and web GIS. Finally, we present the web-based tools that are currently available for ESDA and reflect on future developments.

2

Exploratory Spatial Data Analysis

ESDA has its origins in Exploratory Data Analysis (EDA), which is comprised of a set of tools for finding patterns in data, developed in the past by Tukey (1977). First a data set is described using simple statistics such as the minimum, maximum, median, and upper and lower quartiles, and then visualized using different charts, including stem and leaf displays, boxplots, scatterplots, parallel coordinate plots, etc. From this, outliers can be identified, and patterns can be discovered, which may lead to hypotheses that can then be more formally tested. Tukey (1977, p. 806) expressed his belief in EDA as follows: “Exploratory data analysis is an attitude, a state of flexibility, a willingness to look for those things that we believe are not there, as well as those we believe to be there.” Hence one of the core ideas behind EDA is the ability to view the data in different ways with an open mind. ESDA expands upon EDA, and is a collection of tools and approaches for the statistical analysis of spatial data (Bivand 2010). Using an interactive environment that facilitates investigation of the data through an iterative process, ESDA combines traditional graphs for visualizing spatial distributions with mapping capabilities, where the data are linked across the different visualizations. Displays can be manipulated interactively, e.g., moving three-dimensional axes around in space. In

1674

L. See

this way, outliers can be identified, and spatial patterns can be discovered using clustering, spatial association and other local statistics to describe the data. Hence ESDA takes the specific nature of spatial data into account. Spatial data may exhibit some unique properties compared to non-spatial data, which require alternative methods and statistics for their treatment. These challenges are listed in Table 1 along with several different tools and techniques employed in ESDA that can be Table 1 The unique characteristics of geographic information and ESDA tools and statistical approaches that can be used to handle them Characteristic Spatial autocorrelation

Spatial non-stationarity

Small area problem Modifiable area unit problem (MAUP) Large number of variables

High data volumes

Complexity

Description Observations at different locations are correlated, violating the assumption of independence required by many traditional statistical methods A spatial pattern, a relationship or parameters in a model may change over space, which would not be captured using a global model alone This occurs when either areas are small or counts in an area are small The distribution of area-aggregated data may change if a different set of boundaries are used or if the area sizes change Many geographic data sets have a large number of variables, which can cause problems when used with traditional statistical methods, or they simply become too difficult to visualize Not only are there many variables, the spatial and temporal resolution of many spatial data sets has increased, e.g., openly available, high resolution satellite imagery from Landsat and Sentinel satellites, mobile phone records, geotagged social media, etc. Patterns in spatial data and relationships between variables may be complex due to interactions between them, the large number of variables and the high data volumes. Hence methods are needed that limit the hypothesis space or allow for testing of multiple numbers of hypotheses and patterns at the same time in an efficient manner

Source: Author but based on ideas in Guo (2017)

Tools/methods/approaches Global and local spatial statistics, spatial interpolation, spatial regression and geographically weighted regression Local spatial statistics, spatial kernel density estimation/hotspot mapping, spatial regression and geographically weighted regression Filtering and smoothing Spatial and space-time scan statistics

Dimension reduction techniques

Dimension reduction techniques, spatial kernel density estimation/ hotspot mapping (for point data)

Dimension reduction techniques

Web-Based Tools for Exploratory Spatial Data Analysis

1675

used to address these challenges. For further information, see Guo (2017) and the chapter ▶ “Exploratory Spatial Data Analysis” by Symanzik in this handbook. The idea of spatial autocorrelation is clearly captured through Tobler’s First Law, i.e., things that are closer together are more related to one another than things that are further away. Since geographic phenomena may be correlated over space, traditional statistical models such as regression are not valid since they rely on assumptions of independence between input variables. Hence tests are needed to determine if spatial autocorrelation exists, e.g., Moran’s I, Geary’s c, variograms, Ripley’s K function, etc. (Getis 2010). There are also local versions of these statistics available, e.g., local Moran’s I and Local Indicators of Spatial Association (LISA), which allow for hotspots to be identified in area-based data. Spatial autocorrelation can also be taken explicitly into account using spatial interpolation techniques, which are used to estimate values at locations where they do not exist in order to create a continuous spatial surface. Kriging, in particular, uses a variogram, or a graph that shows the amount of correlation between pairs of points with increasing distance, to determine the distance at which the values are no longer correlated. See chapter ▶ “Geostatistical Models and Spatial Interpolation” by Atkinson and Lloyd in this handbook for further information. Related to spatial autocorrelation is spatial non-stationarity, i.e., the idea that spatial patterns may vary over space, so a global model may not capture the spatial patterns adequately. Applied to point data, spatial kernel density estimation can be used to create a density surface so that the underlying patterns or hotspots appear. Models that specifically take spatial autocorrelation and spatial non-stationarity into account include spatial regression, which includes the addition of spatially lagged variables or spatially lagged errors (see chapter ▶ “Cross-Section Spatial Regression Models” by LeGallo in this handbook), and geographically weighted regression, which provides estimates of local parameters (see chapter ▶ “Geographically Weighted Regression” by Wheeler in this handbook). The small area problem refers to the fact that areas being studied are small in size or the number of observations of a particular phenomenon in the area is small. Smaller areas tend to be less stable than larger areas and hence have higher variance while working with small numbers of observations, e.g., a rare disease rate, may not produce statistics that are reliable. As ESDA is heavily rooted in visualization, many of the software tools that have been developed for ESDA can be used to identify outliers. Different types of filters can be then be applied to extract specific features. Spatial smoothing can reduce the effect of outliers and help to highlight patterns in the data. By taking the neighborhood into account, this helps to reduce the problems related to small areas. The Modifiable Area Unit Problem (MAUP) has long been recognized as a problem that arises when spatial data are aggregated to areas or zones. There are both scale and aggregation effects that contribute to the MAUP. The scale effect refers to the size of the zones while the aggregation effect refers to how the boundaries of these zones are delineated. Changes in either may affect the results of any analysis. As with the small area problem, the visualization component embedded within ESDA tools makes them well suited for investigating the MAUP

1676

L. See

in an iterative fashion. Note that the MAUP is part of the larger ecological fallacy problem, which highlights potential issues that may occur when making inferences about individuals based on area-aggregated data. For example, an unemployment rate for a county may not reflect what is happening at a smaller area or at an individual level. Hence tools are needed that specifically consider the MAUP such as spatial and space-time scan statistics of which Stan Openshaw’s GAM is an early example. Applied to point data, local statistics are calculated in different sized windows to look for patterns in space and time. See the chapter ▶ “Scale, Aggregation, and the Modifiable Areal Unit Problem” by Manley in this handbook for further details of this phenomenon. A large number of variables, high data volumes and complexity are all challenges related to spatial big data (Schintler and Fischer 2020), which have resulted from shifts in data and information generation. These include improvements in computing storage and power, sensor networks enabled through the Internet of Things (IoT), the ubiquity of location-based mobile devices, the explosion of social media and the recent phenomenon of crowdsourcing and user-generated data. Hence spatial big data are also characterized by heterogeneity due to the different data formats, varying spatial and temporal scales and the often, unstructured nature of the data, e.g., from social media. Additional challenges arise when data are combined from sources that are distributed spatially. Although spatial big data require new and diverse approaches for their pre-processing and handling, ESDA does have some methods to offer, in particular, through dimension reduction techniques. For example, there are numerous partitioning and hierarchical clustering methods available that collapse the original data set into a smaller set of classes or types, either using location as an explicit input or not. Principal component analysis and self-organizing maps (SOMs) are other examples of dimension reduction techniques that can be applied prior to visualization or as part of the visualization process (e.g., SOMs). Coupled with data mining tools, ESDA can also help to uncover spatial patterns in this new area of spatial big data.

3

Exploratory Spatial Temporal Data Analysis

ESDA is often concerned with multivariate spatial data but spatial and space-time scan statistics (Table 1) additionally consider the temporal dimension. Applications such as STARS by Sergio Rey represent an attempt to bring in this fourth dimension explicitly, which is referred to as Exploratory Spatial Temporal Data Analysis (ESTDA). For more information, see the chapter ▶ “Spatial Dynamics and SpaceTime Data Analysis” by Rey in this handbook. The review paper by An et al. (2015) also provides a good overview of the different methods available, including explanatory methods, statistical models and simulations. Here we focus only on the explanatory methods. The data are divided into two broad categories. The first type is movement data, which are vectors, or trajectories, that capture the geographic movements of individual objects, such as people or cars, over time. When the vectors

Web-Based Tools for Exploratory Spatial Data Analysis

1677

are collapsed to origins and destinations, they are referred to as Origin-Destination (OD) flow data or from this point onwards simply as flow data. The second type of data are spatial panel data, which are time series of data for spatial units such as pixels, polygons or points at which data are collected or measured. Table 2 provides a brief description of the different explanatory methods available. Tools for visualization are also a key component of space-time analysis for both movement/flow and spatial panel data. Many of the approaches for exploring spatial panel data are adapted from ESDA while exploration of movement data has required the development of some new approaches due to the unique nature of the data. Moreover, the availability of large quantities of movement/flow data is a relatively new phenomenon due to mobile phones (both GPS that records movements but also access to call data records), low cost sensors, smart cards that can track individuals, e.g., movements through public transport systems, and geo-tagged social media. Often noisy and/or unstructured, these spatial big data require the application of pre-processing tools before they can be explored, such as natural language processing, filtering and data reduction.

Table 2 Overview of approaches used for ESTDA Type of data Movement and flow data

Tools/methods/ approaches Time geography

Path descriptors

Path similarity indices Pattern and clustering methods

Spatial field and range methods

Spatial panel data

Metrics and tests related to space-time analysis

Description Different 3-dimensional (3D) visualizations that show bounds on movements, e.g., future movement possibilities, area bounded by a start and end location, occurrence of multiple path overlaps, etc. These include descriptors such as velocity, acceleration and turning azimuth to produce global averages or values by interval These indices use different distance measures to indicate the similarity between two trajectories This involves the application of different clustering methods to features in the movement or flow database, e.g., path similarity indices. Other approaches have been developed, e.g., alignment approaches from DNA (deoxyribonucleic acid) sequencing and principal component analysis to discover individual behavior patterns Movements are represented as rasters (fields) or polygons (ranges of object movement) to identify areas visited frequently or to examine an object’s mobility and use of space. Different methods are then applied including global/local spatial statistics, kernel density estimation, etc. This includes global and local spatial statistics, Markov methods, measures of inequality, different methods of hotspot analysis, and statistical tests, extended to space and time

Source: Author but based on concepts and ideas in An et al. (2015) and Long and Nelson (2013)

1678

4

L. See

The Evolution of Tools for Exploratory Data Analysis/ Exploratory Spatial Temporal Data Analysis

Before we consider web-based tools for ESDA/ESTDA, it is important to understand the evolution and technological developments that occurred in the field of ESDA in the past. The reader is referred to Anselin (2012) for a good overview of the historical developments in ESDA software and the chapter “Exploratory Spatial Data Analysis” by Symanzik in this handbook for a list of standalone software applications of ESDA.

4.1

Early Desktop Applications of ESDA/ESTDA

Standalone, desktop-based applications began to appear around the early 1990s (Table 3). They were designed to fill a much-needed gap in ESDA, particularly in terms of visualization, because GIS applications at the time were not able to fulfill these needs. Table 3 lists some of these early applications, including the lead proponent and key publications. The reader is also referred to Goovaerts (2010), who provides a list of geostatistical software; these are largely focused on providing spatial interpolation functionality but some of the packages listed in his chapter also have more generic ESDA capabilities. Many of these packages were created at a time when the technology for developing interactive web-based applications was still in its infancy and hence too immature for supporting complex ESDA or ESTDA. The data sets for analysis Table 3 Some early desktop-based ESDA/ESTDA tools Tool/Application SPIDER REGARD Cartographic data visualizer (cdv) SAGE Descartes, which became common GIS and then V-analytics GeoVISTA studio GeoViz Toolkit (GVT) GeoDa ESTAT STARS CrimeStat SatScan Space-time intelligence system (STIS) VIS-stamp GeoSurveillance Source: Author

Lead scientist(s) John Haslett Anthony Unwin Jason Dykes Robert Haining Gennady and Natalia Andrienko Mark Gahegan Frank Hardisty Luc Anselin Anthony Robinson Sergio Rey Ned Levine Martin Kulldorff Geoffrey Jacquez

Key publications Haslett et al. (1990) Unwin et al. (1990) Dykes (1998) Haining et al. (1998) Andrienko and Andrienko (2002); Andrienko et al. (2002) Takatsuka and Gahegan (2002) Hardisty and Robinson (2011) Anselin et al. (2010) Robinson (2007) Rey and Janikas (2010) Levine (2006) Kulldorff et al. (2005) Jacquez (2010)

Diansheng Guo Gyoungju Lee

Guo et al. (2006) Lee et al. (2010)

Web-Based Tools for Exploratory Spatial Data Analysis

1679

were also small in comparison to today so a desktop solution was more than adequate for data analysis. Of these packages, it is still possible to download V-Analytics, GeoVISTA Studio, GeoDa, CrimeStat, SatScan and GeoSurveillance. Of these, only GeoDA, CrimeStat and SatScan are supported as desktop applications while V-Analytics still runs but it has an old look and feel to it, reflecting the technology used in its development. GeoVista Studio has been used as the basis for the newer GeoViz toolkit (GVT) and is available from github and the Google Code Archive. The SatScan software was updated in 2018 while the GeoSurveillance pages do not appear to have been updated for some years now. The STARS application is available from the github repository, where the last developments took place between three and five years ago. The STIS system has been replaced by three applications called SpaceStat, ClusterSeer and BoundarySeer, which are proprietary products sold by BioMedware.

4.2

Coupling or Extending GIS with ESDA/ESTDA Functionality

In addition to building standalone applications, other developments in the past have involved either coupling applications with a GIS or building plug-ins that have enhanced the ESDA/ESTDA capabilities of a GIS. Examples of coupling are the predecessor to GeoDa called SpaceStat, and XGobi/Xplore, which were coupled with ESRI’s now obsolete ArcView program, as well as SAGE, which was coupled with ESRI’s ArcInfo, a much older version of the ArcGIS package. An example of a plug-in is the Extended Time-Geographic Framework Tool, which appears as a new toolbar in ArcGIS and allows users to undertake ESTDA within this GIS package. Although originally developed around 2007, this toolbar is still available for downloading and installation with ArcGIS v10.2, which was released in 2013, as well as older versions of the software. ESRI also began to add ESDA functionality to ArcGIS in the form of toolboxes including spatial interpolation (available from 2001), spatial statistics (available from 2004) and geographically weighted regression (available from 2009). Moreover, ESRI implemented some of the brushing and linking functions found in other desktop ESDA programs and hence was beginning to compete with them. However, as a proprietary package, it requires the purchasing of a license unlike most of the standalone ESDA tools.

4.3

Developments in R Related to ESDA/ESTDA

Another development happening in parallel has been more on the open source side. The R programming language was released in 1995 containing routines for statistical and data analysis, as well as visualization. The use of the R statistical software environment requires some knowledge of programming but can be used to produce customized applications, which is an advantage over the desktop-based programs. Even Luc Anselin recognized that R and GeoDa were highly complementary, and

1680

L. See

that more advanced users would move to R once becoming familiar with ESDA techniques through GeoDa. Since then, the software has continued to evolve, with more than two million users worldwide. To expand the functionality of R, users actively contribute their own packages for use by others. Roger Bivand wrote the first version of the spdep R package (Spatial Dependence: Weighting Schemes, Statistics and Models) in 2002, which he continues to develop, releasing the latest version in February 2019. This package contains tests for spatial autocorrelation, the calculation of local statistics, and functions for estimating spatial regression models. A package for geographically weighted regression, spgwr, was developed by Roger Bivand and colleagues in 2017, while a more generalized version for geographically weighted models, called GWmodel, was released in 2014 from a collaboration between the University of Bristol, Wuhan University, NUI Maynooth and Rothamstead Research in the UK, which includes the gwr package. The gstat package, developed by Edzer Pebesma in 2014, developed out of a desktop package for geostatistical modeling including spatial interpolation. The R package also contains functions for visualization of the data. Both the GWmodel and gstat packages continue to be developed, with recent updates in 2019. There is a mapping package in R called ggmap which contains a set of functions that can be used to visualize data overlaid on maps from online sources, e.g., Google Maps, which uses the Rgooglemaps package. R executables can also be embedded in websites or there are other solutions including R-Fiddle, which is an online version of R. Therefore, it is possible to link the ESDA/ESDTA relevant functions found in R to create customized web-based tools. Although neither R nor open source, it is also worth mentioning that ESDA tools were developed in Matlab by James LeSage in the late 1990s so a range of alternatives were available. Having had a brief overview of the historical desktop-based ESDA/ESTDA software and the parallel developments in R and Matlab, we now turn to web-based ESDA.

5

Early Web Developments in ESDA

Applets were one of the early ways that interaction was introduced into websites. These were written in the Java programming language and ran in a web browser so did not require any installation. There are some notable early attempts at using applets for ESDA on the web. For example, in the late 1990s, Gennady and Natalia Andrienko modified their Descartes software (Table 3) to run on the web through a Java applet. Sitting in the back end was the Descartes server and the Descartes application. The user then interacted with the Descartes applet on the client side, and the information needed to drive the application moved back and forth between the client and server using a standard internet protocol, i.e., Hypertext Transfer Protocol (HTTP). However, there were limitations including the size of the data, e.g., it was not possible to deal with files larger than one MB, the size of the applet also needed to be small, which limited the amount of functionality it could contain, the speed of

Web-Based Tools for Exploratory Spatial Data Analysis

1681

the applet was dependent on the hardware and software platforms, and there were security issues around using Java and applets. Another example was undertaken by Luc Anselin and colleagues in 2004, who used an applet to show how outliers could be visualized online in choropleth maps. They demonstrated how a small number of ESDA methods could be deployed over the internet using data on homicides and cancer rates in the US, displayed using both map and graph-based visualizations. Although there would have been technological advances since the endeavors of the Andrienkos, they had similar experiences. For example, they noted that the amount of functionality in the applet was always going to be limited. This was partly because they encountered problems in performance related to downloading the applets and but also when working with medium to large-sized data sets. Moreover, they highlighted the fact that Java is not an optimal language when requiring higher levels of computation. Note that the technology to support Java applets is no longer available in the most up-to-date web browsers due to security issues, and hence applets can no longer run. However, it was also clear at the time that applets were not going to be a long-term or sustainable solution for bringing ESDA tools to the internet. In the early 2000s, researchers at Penn State demonstrated how GeoVISTA Studio could be used to create Web GIS applications that did not use an applet or plug-in. They showed an example using census data from the US and how this could be turned into a component for use in a larger web application. GeoVISTA Studio has now evolved into the GeoViz Toolkit developed further by Frank Hardisty of Penn State, which has taken elements from GeoVISTA Studio. The GeoViz Toolkit has been used, for example, in a mini challenge related to analyzing data on cyber security as well as other web-based applications.

6

The Move Toward Web-Based Mapping

During the last decade, access to geographic information has changed radically as web mapping and web GIS have become mainstream. Using a web browser, information that is distributed across multiple locations can be brought together and accessed widely by the general public. A number of technological and social developments have played an important role in the emergence of web mapping and web GIS, and hence the availability of web-based ESDA tools. These are outlined in more detail below.

6.1

Open Source Software and Open Data

There has been a rapid growth in the open source community over the last three decades, moving GIS from a few commercial vendors back in the 2000s to a current landscape where several open source GIS packages are now available as well as numerous R libraries for spatial analysis and mapping. At the same time, there has been a trend toward the opening up of spatial data sets, e.g., through open data

1682

L. See

government initiatives, the availability of free satellite imagery from Landsat and Sentinel satellites, bottom up projects like OpenStreetMap, data from sensor networks and the Internet of Things, but also some commercial data as well. In order to share these open data sets in a distributed way, open standards are needed. The Open Geospatial Consortium (OGC) has been driving the development of these open standards over the last 25 years, which has clearly contributed to the success of web-based mapping and web GIS.

6.2

Movement Toward a Service Oriented Architecture

A Service Oriented Architecture (SOA) is one composed of individual services that each have a certain function, they are self-contained pieces of code that act as black boxes to their users, and they can be accessed remotely. SOA can be implemented as web services so that they can be accessed over the internet, and they are independent of any programming languages or platforms. Web Map Services (WMS) were developed by the OGC as a standardized protocol for serving georeferenced maps as images in a web browser. These images are typically bitmaps in different formats. The latest version of the protocol was published in 2006. Other complimentary services were also developed including Web Feature Services (WFS) for serving vector data, Web Coverage Services (WCS) for raster data, and Web Processing Services (WPS) for running processes over the internet, e.g., launching a clustering algorithm. These services can be integrated into a GIS web-based application, which can access data housed in different locations, process the information using a set of available WPS and then display the information to the user in an interactive way. Many web-based mapping applications use these OGC services to access spatial data. Web services can be implemented using a range of different technologies. One of the most common is the Representational State Transfer (REST) architecture, which often uses the HTTP protocol, as shown in Fig. 1. Application Programming Interfaces (APIs) consist of the code that allows two applications to communicate, e.g., HTTP requests go from the client, which can be a web browser or an application on a mobile device, to a GIS server, which then processes the data, produces the output and communicates back with the client through an HTTP response, which

Fig. 1 A server-client architecture that uses Representational State Transfer (REST) and HTTP for communication. (Source: Author)

Web-Based Tools for Exploratory Spatial Data Analysis

1683

displays the results back in the client. The API, which sits on the server side, is the code that handles this operation. RESTful APIs are often used in web-based and cloud applications because they use less bandwidth than other architectures, making them ideal for internet usage. For example, ESRI’s ArcGIS has a REST API for accessing ArcGIS’ library of spatial mapping services and data. There are now many other APIs for spatial web services available, which has facilitated the building of web-based applications.

6.3

The Emergence of Commercial Web Mapping Services

A third trend has been in the development of web mapping services by commercial companies such as Google and Microsoft. In 2005, Google released the first web mapping service, providing the public with access to satellite imagery and road networks through a web browser for the first time. This was followed by Google Earth and NASA World Wind with 3D visualization capabilities. Although clearly not a GIS, these developments were, nevertheless, instrumental in opening up large data sets through APIs. This enabled mashups, i.e., the bringing together of data sets from more than one source to enrich the information content by web developers. However, applications also appeared that allowed members of the public to make their own maps and mashups. Thus, the public has become more and more familiar with the sight of maps on the web.

6.4

Key Developments in Python and PySal

Python is a high-level, general purpose programming language that has become one of the most widely used languages in the world of GIS. It was adopted after the year 2000, when version 2.0 was released and it became easier to use. The syntax of the language is very much like reading English, which makes it an easy language in which to program. Python is often used as a scripting language, allowing for batch programming. This is very much in line with the needs of GIS users, who want to run spatial operations many times. ESRI’s ArcGIS API for Python provides access to a library of analysis tools for working with maps and spatial data over the internet. QGIS, a free and open source software application, also uses Python for scripting and can be used to create web-based mapping applications in a browser; other GIS packages have followed suit. An important development for ESDA that has used Python is PySAL (Rey and Anselin 2010), which is the Python-based Spatial Analysis Library first released in the summer of 2010. It was originally conceived through discussions that took place in the early 2000’s between Luc Anselin and Sergio Rey at Arizona State, as lead proponents of the GeoDA and STARS applications, respectively. They realized the advantages of working together by sharing algorithms and data structures to create a single library that both could use, and that could be opened up to others in the community. PySAL is comprised of a number of different components as shown in

1684

L. See

Fig. 2 The components of PySAL, organized by data analytic functions, ESDA and modeling functions. (Figure created by the author based on a written description in Rey and Anselin 2010)

Fig. 2. There are several data analytic functions that are intended to make the reading, writing and manipulation of spatial data easier. The geometric summaries of spatial patterns include creation of Thiessen polygons, minimum spanning trees and convex hulls, which can be used to derive spatial weight matrices. The other components of PySal relate to ESDA and functions for modeling. For example, there is a module called esda, which contains routines for calculating global and local spatial statistics along with numerous classification schemes for choropleth maps, and another called spatial dynamics, which is focused on spatialtemporal analyses. The module spreg handles spatial regression while inequality has routines for calculating measures such as the Gini coefficient, all of which are of interest to users undertaking ESDA. Updates are made to PySAL every six months and it is available at: https://pysal.org/. PySAL is flexible and can be deployed in three ways: (a) to create standalone desktop-based software applications; (b) to develop extensions and plug-ins to GIS software; and (c) for integration into a web-based application. More details on how PySAL is being used are provided in the next section.

7

Web-Based Tools for ESDA/ESTDA

Fifteen to twenty years ago, there were dozens of different standalone desktop programs available for ESDA/ESTDA and geostatistical analysis. The current situation for the user is now much more varied. Web mapping has become a common tool, often embedded seamlessly in many websites, employing web map services, cloud-based solutions, and APIs to integrate data from multiple sources. Desktop-based software still exists but web-based tools are becoming more

Web-Based Tools for Exploratory Spatial Data Analysis

1685

prevalent. In this section we examine the current status of web-based tools that are available for ESDA and ESTDA.

7.1

Web Mapping Platforms and Tools with Little ESDA/ESTDA Functionality

Several cloud-based mapping platforms are now available, which allow users to build their own web-based applications or add interactive mapping to a website. Some examples include Google Maps, CARTO, MangoMap, MapBox, Mapzen and NextGIS, among others. MapSense was another example but has just been purchased by Apple; hence this is a rapidly changing commercial landscape of web map service providers. Leaflet and OpenLayers are two widely used open source Java libraries for creating web mapping applications while GC2 is an open source project for managing geospatial data, making map visualizations and creating applications. However, none of these has tools for ESDA like that found in, for example, PySal, as these platforms are meant for the development of simple, yet effective visualizations.

7.2

Online Versions of GIS Software

GIS software has advanced considerably over the last 20 years in terms of ESDA/ ESTDA capabilities. The trend to move online means that this functionality is now becoming available as a web-based tool. For example, ArcGIS Online is the cloudbased version of the ArcGIS software package. QGIS Cloud is a web-based GIS platform for publishing maps, data and services on the internet with all the capabilities of QGIS. WebGIS is Hexagon Geospatial’s version of online web mapping tools, another commercial vendor in the GIS world (previously known as GeoMedia). However, even with moving GIS online, not all the functionality required for ESDA is present, i.e., GIS is still lagging behind in what it offers for ESDA compared to a package like GeoDa. The solutions are available through PySAL, e.g., by writing toolbars that can be accessed directly in ArcGIS. There are also plans to include functions from the PySAL library directly as core processing tools in QGIS. But it will take time before full ESDA functionality is available in online versions of current GIS packages.

7.3

Building Customized Web Applications

Users can now build their own customized web-based applications for ESDA/ ESTDA using the vast collection of open source libraries for spatial data handling that are now available, the open APIs to access these libraries and many open data sets. As mentioned previously, R functions can be embedded in websites. Moreover, the development of spatially-indexed databases such as PostGIS (a spatial database

1686

L. See

extending the PostgreSQL object relational database) have made it easier to store and access spatial data in web-based applications. Another interesting, recent development relevant for ESDA that is increasing in popularity is the Jupyter notebook. This is an open-source web application for sharing data, programming code, visualizations and text in a single location. These notebooks can then be shared, which others can build on or simply use as a way of examining scientific workflows, e.g., to reproduce the work of others. Running in a web browser, users can document their ESDA using routines found in the PySAL library, for example. The PySAL Notebooks project is being organized by the University of Liverpool, which provides the resources for users to build their own workflows and applications. Jupyter notebooks can also be shared online as web-based tools that support ESDA. Moreover, Jupyter notebooks can access GIS packages through Python APIs, e.g. ArcGIS and GRASS, to access their spatial analytical tools.

7.4

GeoDa-Web

GeoDa-Web is seen by Luc Anselin as the next generation of GeoDa, aimed at addressing challenges related to the analysis of big data, the integration of open data and the sharing of results through social media. It will also fill the current gap in web mapping platforms that do not provide much more than very simple ESDA or spatial analysis. The architecture of GeoDa-Web integrates cloud-based software services (e.g., Google Maps, CARTO, MapZen) with various APIs for data access, spatial analysis, visualization and publication of the results. GeoDa-Web accesses spatial analytical web services through the PySAL REST API and the GeoDA Cloud API to run the relevant ESDA/ESTDA functions. Visualization uses a hybrid approach, combining the drawing features in HTML5 with cloud mapping platforms for fast display of results as map tiles. Once the results are produced, e.g. maps, graphs or text, they can be published on social networking sites. The system is currently under development (https://spatial.uchicago.edu/geoda-web) but demonstrations of its functionality can be found at: http://lixun910.tumblr.com/.

7.5

CyberGIS

CyberGIS is the combination of GIS (and GIScience) with a cyberinfrastructure that facilitates geographical knowledge discovery and provides solutions to complex problems through process modeling. Cyberinfrastructure in the context of GIS refers to three main components: (i) the spatial data; (ii) the geoprocessing; and (iii) the software systems needed to integrate the first two resources in a seamless fashion. Both the spatial data and the processing are often distributed across multiple locations. The spatial data sets can be those found in distributed data repositories, spatial data infrastructures, or accessed via various APIs from sensor networks, social media and other new and emerging sources of big data. Another key driver

Web-Based Tools for Exploratory Spatial Data Analysis

1687

behind a cyberinfrastructure is the need to facilitate collaboration between scientists, promoting an interdisciplinary approach. CyberGIS has primarily been driven by the challenges arising from spatial big data but also the need for computational scalability through high-performance computing and the need to parallelize algorithms. Significant challenges remain in terms of geospatial interoperability and open data integration. The CyberGIS Gateway, funded by the US National Science Foundation, is an example of a current CyberGIS in operation. It contains a set of scalable spatial analysis and modeling applications supported by the GISolve middleware and two supercomputers. The CyberGIS Gateway currently has one component relevant to ESDA, i.e., CG-PySAL, which provides access to the spreg module of the PySAL library. These contain functions for spatial econometrics including spatial regression but could have other ESDA modules added in the future. This is clearly an area of growth but one that is still in its infancy.

8

Conclusion

The field of ESDA and ESTDA has seen an evolution from the early 1990s onwards, when standalone desktop applications dominated. These applications were designed to fill a gap in GIS software, which was simply not capable of the types of analyses and linked visualizations required at the time. Web-based mapping and web GIS have since become mainstream and the tools now available for ESDA and ESTDA are much more varied. Users can still run desktop software such as GeoDa, which continues to be supported, or choose an online GIS, with some ESDA/ESTDA capabilities. The trend for GIS software to move toward cloud-based GIS platforms will continue, particularly as the demand for exploring and analyzing large, distributed data sets increases. Users can also build and customize their own applications using the many web services and APIs now available or embed R functions into web browsers. Big spatial data still present many challenges, and new solutions such as GeoDa-Web and CyberGIS are on the horizon. However, developing ESDA methods for visualizing massive data sets is still a key challenge. This challenge is magnified in cases where the data are distributed across many locations.

9

Cross-References

▶ Cross-Section Spatial Regression Models ▶ Exploratory Spatial Data Analysis ▶ Geographically Weighted Regression ▶ Geostatistical Models and Spatial Interpolation ▶ Scale, Aggregation, and the Modifiable Areal Unit Problem ▶ Spatial Dynamics and Space-Time Data Analysis

1688

L. See

References An L, Tsou M-H, Crook SES, Chun Y, Spitzberg B, Gawron JM, Gupta DK (2015) Space–time analysis: concepts, quantitative methods, and future directions. Ann Assoc Am Geogr 105 (5):891–914 Andrienko G, Andrienko N (2002) Intelligent support of visual data analysis in Descartes. In: Proceedings of the 2nd international symposium on smart graphics – SMARTGRAPH’02. ACM Press, Hawthorne/New York, pp 21–26 Andrienko N, Andrienko G, Voss H, Bernardo F, Hipolito J, Kretchmer U (2002) Testing the usability of interactive maps in CommonGIS. Cartogr Geogr Inf Sci 29(4):325–342 Anselin L (2012) From spacestat to cyberGIS: twenty years of spatial data analysis software. Int Reg Sci Rev 35(2):131–157 Anselin L, Syabri I, Kho Y (2010) GeoDa: an introduction to spatial data analysis. In: Fischer MM, Getis A (eds) Handbook of applied spatial analysis. Springer, Berlin/Heidelberg, pp 73–89 Bivand L (2010) Exploratory spatial data analysis. In: Fischer MM, Getis A (eds) Handbook of applied spatial analysis. Springer, Berlin/Heidelberg, pp 219–254 Dykes J (1998) Cartographic visualization: exploratory spatial data analysis with local indicators of spatial association using TCL/TK and cdv. Statistician 47(3):485–497 Getis A (2010) Spatial autocorrelation. In: Fischer MM, Getis A (eds) Handbook of applied spatial analysis. Springer, Berlin/Heidelberg, pp 255–278 Goovaerts P (2010) Geostatistical software. In: Fischer MM, Getis A (eds) Handbook of applied spatial analysis. Springer, Berlin/Heidelberg, pp 125–134 Guo D (2017) Exploratory spatial data analysis. In: Richardson D, Castree N, Goodchild MF, Kobayashi A, Liu W, Marston RA (eds) International encyclopedia of geography: people, the earth, environment and technology. Wiley, Oxford, pp 1–25 Guo D, Chen J, Mac Eachren AM, Liao K (2006) A visualization system for space-time and multivariate patterns (VIS-STAMP). IEEE Trans Vis Comput Graph 12(6):1461–1474 Haining RP, Wise SM, Ma J (1998) Exploratory spatial data analysis in a GIS environment. Statistician 47(3):457–469 Hardisty F, Robinson AC (2011) The geoviz toolkit: using component-oriented coordination methods for geographic visualization and analysis. Int J Geogr Inf Sci 25(2):191–210 Haslett J, Wills G, Unwin A (1990) SPIDER—an interactive statistical tool for the analysis of spatially distributed data. Int J Geogr Inf Syst 4(3):285–296 Jacquez GM (2010) Space-time intelligence system software for the analysis of complex systems. In: Fischer MM, Getis A (eds) Handbook of applied spatial analysis. Springer, Berlin/Heidelberg, pp 113–124 Kulldorff M, Heffernan R, Hartman J, Assunção R, Mostashari F (2005) A space–time permutation scan statistic for disease outbreak detection. PLoS Med 2:e59 Lee G, Yamada I, Rogerson P (2010) GeoSurveillance: GIS-based exploratory spatial analysis tools for monitoring spatial patterns and clusters. In: Fischer MM, Getis A (eds) Handbook of applied spatial analysis. Springer, Berlin/Heidelberg, pp 135–150 Levine N (2006) Crime mapping and the Crimestat program. Geogr Anal 38(1):41–56 Long JA, Nelson TA (2013) A review of quantitative methods for movement data. International Journal of Geographical Information Science 27(2):292–318 Rey S, Anselin L (2010) PySAL: a Python library of spatial analytical methods. In: Fischer MM, Getis A (eds) Handbook of applied spatial analysis. Springer, Berlin/Heidelberg, pp 175–195 Rey S, Janikas MV (2010) STARS: space-time analysis of regional systems. In: Fischer MM, Getis A (eds) Handbook of applied spatial analysis. Springer, Berlin/Heidelberg, pp 91–112 Robinson AC (2007) A design framework for exploratory geovisualization in epidemiology. Inf Vis 6(3):197–214 Schintler LA, Fischer MM (2020) The analysis of big data on cities and regions – some computational and statistical challenges. In: Glaeser ED, Kourtit K, Nijkamp P (eds) Urban Empires. Cities as global rules in the new urban world. Roultledge, Abingdon (UK), pp. 197–209

Web-Based Tools for Exploratory Spatial Data Analysis

1689

Takatsuka M, Gahegan M (2002) GeoVISTA studio: a codeless visual programming environment for geoscientific data analysis and visualization. Comput Geosci 28(10):1131–1144 Tukey JW (1977) Exploratory data analysis. Addison-Wesley Pub. Co, Reading Unwin AR, Wills G, Haslett J (1990) REGARD – graphical analysis of regional data. In: ASA proceedings of the section on statistical graphics. American Statistical Association, Alexandria, pp 36–41

Further Reading Bivand R, Pebesma EJ, Gómez-Rubio V (2013) Applied spatial data analysis with R, 2nd edn. Springer, New York/Heidelberg/Dordrecht/London Li X, Anselin L, Koschinsky J (2015) GeoDa web: enhancing web-based mapping with spatial analytics. In: Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems – GIS’15. ACM Press, Bellevue/Washington, pp 1–4 Openshaw S, Charlton M, Wymer C, Craft A (1987) A mark 1 geographical analysis machine for the automated analysis of point data sets. Int J Geogr Inf Syst 1(4):335–358 Rey S, Anselin L, Li X, Pahle R, Laura J, Li W, Koschinsky J (2015) Open geospatial analytics with PySAL. ISPRS Int J Geo-Inf 4(2):815–836 Wang S, Goodchild M (eds) (2018) CyberGIS for geospatial discovery and innovation. Springer, Berlin/Heidelberg/New York

Spatio-temporal Data Mining Tao Cheng, James Haworth, Berk Anbaroglu, Garavig Tanaksaranond, and Jiaqiu Wang

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Spatio-temporal Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Statistical Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Data Mining Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Space-Time Forecasting and Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Statistical (Parametric) Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Machine Learning (Nonparametric) Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Space-Time Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Partitioning and Model-Based Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Density- and Grid-Based Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Statistical Tests for Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Spatio-temporal Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Two-Dimensional Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Three-Dimensional Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1692 1693 1694 1695 1696 1696 1696 1699 1700 1701 1701 1701 1702 1702 1703 1703 1703 1704

T. Cheng (*) · J. Haworth SpaceTimeLab, Department of Civil, Environmental and Geomatic Engineering, University College, London, UK e-mail: [email protected]; [email protected] B. Anbaroglu Department of Geomatics Engineering, Hacettepe University, Ankara, Turkey e-mail: [email protected] G. Tanaksaranond Department of Survey Engineering, Chulalongkorn University, Bangkok, Thailand e-mail: [email protected] J. Wang National Institute for Cardiovascular Outcomes Research, Barts NHS Trust, London, UK e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_68

1691

1692

T. Cheng et al.

5.3 Animated Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Visual Analytics: The Current Visualization Trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1705 1705 1706 1706 1707 1708

Abstract

As the volume, variety, and veracity of spatio-temporal datasets increase, traditional statistical methods for dealing with such data are becoming overwhelmed. Nevertheless, spatio-temporal data are rich sources of information and knowledge, waiting to be discovered. The field of spatio-temporal data mining emerged out of a need to create effective and efficient techniques in order to turn big spatiotemporal data into meaningful information and knowledge. This chapter reviews the state of the art in spatio-temporal data mining research and applications, from conventional statistical methods to machine learning approaches in the big data era, with emphasis placed on three key areas: prediction, clustering/classification, and visualization. Keywords

Spatio-temporal · Data mining · Prediction · Clustering · Visualization · Artificial intelligence · Machine learning

1

Introduction

Data mining is the process of discovering patterns in large datasets involving methods, techniques, and algorithms at the intersection of statistics, artificial intelligence, machine learning, and database systems. It should be noted that the term data mining is a misnomer, since the objective is to extract patterns and knowledge, and not mining of data itself. The term is also a buzzword, frequently applied to any form of large-scale information processing. The actual data mining task involves the semiautomatic or automatic analysis of large quantities of data to extract previously unknown patterns such as clusters of data, anomalies, and dependences in data. With automatic sensor networks, the Internet of Things, and crowd sourcing now being used extensively to monitor a diverse range of phenomena, the volume, variety, and veracity of spatio-temporal data being routinely collected have increased dramatically (Li et al. 2016). Examples include daily temperature series at weather stations, crime counts in census tracts, geotagged social media, and traffic flows on urban roads. Spatio-temporal data mining (STDM) is that subfield of data mining that focuses on the process of discovering patterns in large spatio-temporal (geolocated and time-stamped) datasets with the overall objective of extracting information and transforming it into knowledge to enable decision making. The major tasks of STDM include spatio-temporal prediction, clustering, classification,

Spatio-temporal Data Mining

1693

and visualization. The methods designed to carry out these tasks must account for spatio-temporal autocorrelation and heterogeneity, which sets them apart from traditional data mining methods. Early methods for identifying patterns in spatio-temporal data include statistical models from the fields of time series analysis, spatial analysis, and econometrics. These approaches were typically concerned with teasing scarce information from homogeneous datasets, with an emphasis on statistical significance and explanatory power. Such approaches are appropriate in specific cases where the model assumptions hold but are not generally applicable when the size and diversity of spatiotemporal data become large. Increasingly, researchers and practitioners turned toward less conventional techniques that are better suited to deal with the heterogeneous, nonlinear, and multi-scale properties of large-scale, streaming spatio-temporal datasets. For instance, methods from artificial intelligence, machine learning, and, increasingly, deep learning are now being successfully applied to STDM tasks. Recent surveys have reviewed the literature on STDM and applications (Atluri et al. 2018). This chapter begins in Sect. 2 with an introduction to spatio-temporal autocorrelation or dependence, the presence of which necessitates specialized STDM approaches. The remainder is organized around three main tasks of STDM: Sect. 3 is devoted to spatio-temporal modeling and prediction, by either statistical (parametric) approaches or machine learning (nonparametric) approaches. Sect. 4 reviews spatio-temporal clustering, which is followed by an introduction to spatio-temporal visualization in Sect. 5. The final section summarizes the future directions of research in STDM.

2

Spatio-temporal Dependence

An observation from nature is that near things tend to be more similar than distant things both in space and time. For instance, the weather tomorrow is likely to be more similar to today’s weather than the weather a week ago or a month ago and so on. Similarly the weather 1 mile away is likely to be more similar than the weather ten miles away or 100 miles away. These phenomena are referred to, respectively, as temporal and spatial dependence. Such dependence can be present both within and between datasets (cross- or co-dependence). The presence of dependence in spatial and temporal data violates the stationarity assumption of classic statistical models such as ordinary least squares and necessitates the use of specialized spatio-temporal analysis techniques. However, its presence also enables future events to be predicted based on knowledge of past events. The first step in a STDM workflow should be to identify whether STDM methods are appropriate to the task at hand. This involves examining the dataset for evidence of spatio-temporal patterns. In this section, we review the identification of spatio-temporal dependence and association from two viewpoints: statistical tests of spatial dependence and identification of association rules.

1694

2.1

T. Cheng et al.

Statistical Approaches

Testing for dependence is typically accomplished using autocorrelation analysis. Autocorrelation is the cross-correlation of a signal with itself and can be measured in temporal data using the temporal autocorrelation function or in spatial data using an index such as the familiar Moran’s I coefficient (Anselin 1988). Both of these are based on Pearson correlation. The spatio-temporal autocorrelation function, also Pearson based, measures autocorrelation in spatio-temporal data at given spatial and temporal lags (Pfeifer and Deutsch 1981). Pearson-based measures are appropriate for data represented as graphs using a spatial weight matrix, examples of which include administrative polygons and road networks. Indices have also been devised for point attribute data, including space-time (semi)variograms (Heuvelink et al. 2015). The aforementioned measures are global, implying a degree of fixity in the level of autocorrelation across the space/time such that it can be described by a single parameter. However, this is often unrealistic. Time series may exhibit nonstationarity, which means that their mean and variance are not constant in time. The analogue of this in spatial data is referred to as heterogeneity. Heterogeneity has two distinct aspects: structural instability as expressed by changing functional forms or varying parameters and heteroskedasticity that leads to error terms with nonconstant variance (Anselin 1988). Ignoring nonstationarity or heterogeneity can have serious consequences including biased parameter estimates, misleading significance levels, and poor predictive power. Anselin (1988) provides some methods for testing for heterogeneity in spatial data. Additionally, a number of local indicators of spatial association have been devised (see the chapter ▶ “Exploratory Spatial Data Analysis” by Symanzik in this handbook). These include a local variant of Moran’s I and Getis and Ord’s Gi and Gi statistics, which measure the extent to which high and low values are clustered together. A simple method that can be applied to spatio-temporal data is the cross-correlation function, which measures temporal autocorrelation between two series (Cheng et al. 2012; Figs. 1 and 2).

a

b

0.6

0.4

CoD

0.4

CCF

0.6

0.2

0.0

Links R1616 & R524 R463 & R1593 R1593 & R2324

0.2

0.0

−0.2

−0.2 −36 −30 −24 −18−12 −6 0 6 12 18 24 30 36 Lags

−36 −30−24 −18 −12 −6 0 6 12 18 24 30 36 Lags

Fig. 1 (a) Cross-correlation function (CCF) and (b) coefficient of determination (CoD) between unit journey times of three pairs of road links in central London in the AM peak period (7–10 am) (Cheng et al. 2012)

Spatio-temporal Data Mining

1695

a

b

0

0

1 km

c

1 km

Camera Locations Cross-correlations 0.56

0

1 km

Fig. 2 Average cross-correlation function (CCF) between links and their first-order neighbors at temporal lag zero in (a) the AM peak; (b) interpeak; and (c) PM peak (Cheng et al. 2012)

Autocorrelation measures deal with correlation in attributes observed at spatiotemporal locations. Often, we are interested in the spatio-temporal arrangement of data points. For example, we may be interested in whether the spatio-temporal distribution of crime locations is random. In such cases, rather than testing for the presence of autocorrelation, we can test against an assumption of complete spatio-temporal randomness or can employ indicators of spatio-temporal clustering such as the spatio-temporal Ripley’s K function. See González et al. (2016) for a review.

2.2

Data Mining Approaches

Association (or co-location) rule mining is an approach that has its roots in the data mining community. It involves the inference of the presence of spatio-temporal features in the neighborhood of other spatio-temporal features. A spatio-temporal co-location rule implies a strong association between locations A and B; if the attributes of A take some specific value at a point in time, then with a certain probability, at the same point in time, the attributes of B will take some specific value. A related STDM task is mixed drove co-occurrence pattern mining. Mixed drove co-occurrence patterns are subsets of two or more different object types whose instances are often located close to one another in space and time. A review of these approaches can be found in Shekhar et al. (2015). The drawback of these methods is

1696

T. Cheng et al.

that only contemporaneous associations are considered so they do not account for the evolution of a spatial process over time. A logical extension to association mining is to analyze spatio-temporal sequential patterns. This involves finding sequences of events (an ordered list of item sets) that occur frequently in spatio-temporal datasets. Sequential pattern mining algorithms were first introduced to extract patterns from customer transaction databases. A spatio-temporal sequential pattern means that if at some point in time and space, the attributes in A take some specific value, then with a certain probability at some later point in time, attributes at B will take some specific value. Sequential pattern mining implicitly incorporates the notion of spatio-temporal dependence; that the events at one location at one time can have some causal influence on the events at another location at a subsequent time. A similar concept to sequential patterns are cascading spatio-temporal patterns, which are ordered subsets of events that are located close together and occur in a cascading sequence (Shekhar et al. 2015).

2.3

Summary

The methods described in this section help identify spatio-temporal dependence and structure in datasets. They can be used to inform whether STDM is an appropriate approach and the type of model that is required. If global spatio-temporal autocorrelation is present, then a global parametric model specification may be sufficient. However, if nonstationary or heterogeneous behavior is observed, then traditional parametric approaches may not be sufficient, and appropriate STDM methods must be employed. These approaches are discussed in the remainder of this chapter.

3

Space-Time Forecasting and Prediction

Space-time models must account for the problem of spatio-temporal autocorrelation. Uptake of spatio-temporal models has traditionally been limited by the scarcity of large-scale spatio-temporal datasets. This is a situation that has been reversed over recent decades, and we are now inundated with data and require methods to deal with them quickly and effectively. The methods that are currently applied to space-time data can be broadly divided into two categories: statistical (parametric) methods and machine learning (nonparametric) methods. These are described in turn in the following subsections.

3.1

Statistical (Parametric) Models

The state of the art in statistical modeling of spatio-temporal processes represents the outcome of several decades of cross-pollination of research between the fields of

Spatio-temporal Data Mining

1697

time series analysis, spatial statistics, and econometrics. Some of the methods commonly used in the literature include space-time autoregressive integrated moving average (STARIMA) models and variants, spatial panel data models, geographically and temporally weighted regression, eigenvector spatial filtering, and spatiotemporal Bayesian hierarchical models.

3.1.1 Space-Time Autoregressive Integrated Moving Average Space-time autoregressive integrated moving average represents a family of models that extend the ARIMA time series model to space-time data (Pfeifer and Deutsch 1980). STARIMA explicitly takes into account the spatial structure in the data through the use of a spatial weight matrix (for the definition of spatial weight matrix, see the chapter ▶ “Cross-Section Spatial Regression Models” by LeGallo in this handbook). The general STARIMA model expresses an observation of a spatial process as a weighted linear combination of past observations and errors lagged in both space and time. A fitted STARIMA model is usually described as a STARIMA (p,d,q) model, where p indicates the autoregressive order, d the order of differencing, and q the moving average order of the model. The application of STARIMA models has been fairly limited in the literature, with examples existing in traffic prediction and temperature forecasting. Some important special cases of the general STARIMA model should be noted; when d= 0, the model reduces to a STARMA model; furthermore, a STARMA model with q= 0 is a STAR model and with p=0 is a STMA model. STARIMA models are useful tools for modeling space-time processes that are stationary (or weak stationary) in space and time (i.e., spatial-temporal autocorrelation is low). Although the STARIMA model family accounts for spatio-temporal autocorrelation, it has not yet been adequately adapted to deal with spatial heterogeneity as parameter estimates are global. Recent approaches have addressed this by allowing local and time-varying weight matrices (Cheng et al. 2014). 3.1.2 Spatial Panel Data Models Panel data is a term used in the econometrics literature for multidimensional data. A panel contains observations on multiple phenomena (cross-sections) over multiple time periods. Spatial panels typically refer to data containing time series observation of spatial units such as zip codes, regions, cities, states, and countries. Panel data are more informative and contain more variation and less collinearity among variables. The use of panel data increases the efficiency in model estimation and allows for specifying more complicated hypotheses (Elhorst 2010). Two types of data models using spatial panels may be distinguished. The first type relates to static models. These models just pool time series cross-section data but more often control for fixed or random spatial and/or time period-specific effects. The second type relates to dynamic spatial panel data models, i.e., spatial panel data models with mixed dynamics in both space and time that have received increasing attention in the spatial econometrics literature in recent years. The chapter ▶ “Spatial Panel Models and

1698

T. Cheng et al.

Common Factors” by Elhorst in this handbook provides a survey of the existing literature on spatial panel models, considering both static and dynamic models and discussing specification and estimation issues.

3.1.3 Space-Time Geographically Weighted Regression Recently, there has been a great deal of interest in extending geographically weighted regression (GWR; see the chapter ▶ “Geographically Weighted Regression” by Wheeler in this handbook) to the temporal dimension. In their geographically and temporally weighted regression model, Huang et al. (2010) incorporate both the spatial and temporal dimensions into the weight matrix to account for spatial and temporal nonstationarity. The technique was applied to a case study of residential housing sales in the city of Calgary from 2002 to 2004 and found to outperform GWR as well as temporally weighted regression. A spatio-temporal kernel function is proposed to replace the spatio-temporal weight matrix as a further extension of GWR (Fotheringham et al. 2015), and a further mixed geographically and temporally weighted regression can explore both global and local spatio-temporal heterogeneity. 3.1.4 Space-Time Geostatistics Space-time geostatistics is concerned with the statistical modeling of geostatistical data that vary in space and time. It estimates geostatistical concepts such as covariance structures and semivariograms to the space-time domains (Heuvelink et al. 2015). The aim is to build a process that mimics some patterns of the observed spatio-temporal variability. The first step usually involves separating the deterministic component m(u, t) of space-time coordinates u and t. Following this, a covariance structure is fitted to the residuals. The simplest approach is to separate space and time and consider the space-time covariance to be either a sum (zonal anisotropy model) or product (separable model) of separate spatial and temporal covariance functions. Although simple to implement, these models have the disadvantage that they do not consider space-time interaction. They assume a fixed temporal pattern across locations and a fixed spatial pattern across time. Additionally, it is not straightforward to separate the component structures from the experimental covariances. The other approach is to model a joint space-time covariance structure. This approach is generally accepted to be more appropriate. Once an appropriate spacetime covariance structure has been defined, one can use standard Kriging techniques for interpolation and prediction; space-time geostatistical techniques are best applied to stationary space-time processes. Highly nonstationary spatio-temporal relationships require a very complicated space-time covariance structure to be modeled for accurate prediction to be possible. Despite being spatio-temporal in nature, the main function of space-time geostatistical models is space-time interpolation, and they encounter problems in forecasting scenarios where extrapolation is required. Recently, geostatistical models have been implemented in a Bayesian framework, which enables exact inference and quantification of uncertainty. They have also been adapted to be applied to network datasets, with case studies in transport.

Spatio-temporal Data Mining

3.2

1699

Machine Learning (Nonparametric) Approaches

In parallel to the development of statistical space-time models, there was a multidisciplinary explosion of interest in nonparametric machine learning methods. Machine learning is a subfield of soft computing that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. Machine learning may be defined as an application of artificial intelligence with the primary objective of allowing computing systems to learn automatically without human interaction or assistance and adjust actions accordingly. Learning may be supervised, semi-supervised, and unsupervised. The field of machine learning is huge, and a comprehensive review of methods such as self-organizing maps, principal components analysis, classification and regression trees, and random forests is beyond the scope of this chapter. The book Machine Learning for Spatial Environmental Data by Kanevski et al. (2009) provides a good introduction into the field with special emphasis on the use of spatial data.

3.2.1 Artificial Neural Networks Artificial neural networks – vaguely inspired by information processing and communication patterns in biological nervous systems – are a specific set of algorithms that have revolutionized machine learning. Artificial neural networks may be viewed as general function approximations which is why they can be applied to many domains such as classification and regression. In a neural network, data are passing through interconnected layers of nodes, classifying characteristics and information of a layer before passing the results onto other nodes in subsequent layers. Neural networks and deep learning differ only by the number of network layers. A typical neural network may have two to three layers; deep learning networks might have dozens or hundreds. There are supervised and unsupervised models using neural networks. The most generally known is the feedforward neural network whose architecture is a connected and directed graph of neurons (processing units) with no cycles and trained using an algorithm called backpropagation. The reader is encouraged to review the key concepts of neural network modeling covered in Fischer (2015). Artificial neural networks have been widely applied in spatial and temporal analysis. For example, Kanevski et al. (2009) have applied various types of neural network architectures to spatial and environmental modeling problems including radial basis function neural networks, general regression neural networks, probabilistic neural networks, and neural network residual kriging models and have gained excellent results. The strength of the neural networks is that they learn from empirical data and can be applied to almost any machine learning processes about learning complex mapping from the input to the output space. This makes them particularly effective in accounting for dependency structures present in space-time data that cannot be explicitly modeled. Artificial neural networks have been applied to spatio-temporal pattern recognition problems since the 1980s, and many of the algorithms are now regaining popularity in the guise of deep learning, the latest advance of machine learning.

1700

T. Cheng et al.

Recently, deep learning has been used for predictive learning of spatio-temporal data. Deep learning networks can automatically extract latent spatial and/or temporal features to model the underlying spatio-temporal dependence in the data. Spatiotemporal deep learning models combine neural architectures with a large set of hidden layers for application to spatio-temporal datasets. Convolutional neural networks can be combined with a recurrent neural network or long-short-term memory networks for grid-based spatio-temporal predictive learning.

3.2.2 Kernel Methods In machine learning, kernel methods represent a class of methods and algorithms for pattern analysis, i.e., finding types of relationships (clusters, principal components, classifications, correlations). The methods use kernel functions that allow them to operate in a high-dimensional, implicit feature space without calculating the coordinates of data in the space, but computing the inner products between the images of all the pairs in the feature space. This operation is termed kernel trick (or kernel substitution) and turns any linear model into a nonlinear model by replacing its features (predictors) by a kernel function. This technique, for example, applied to principal components analysis leads to a nonlinear variant of principal components analysis. Additional information on kernel methods can be found in Chap. 6 of Bishop (2006). The most prominent member of the class of kernel methods is the support vector machine. A support vector machine constructs a hyperplane or sets of hyperplanes for solving problems in classification, regression, and outlier detection. An important characteristic of support vector machines is that the determination of model parameters corresponds to a convex optimization problem so that any local solution found is also a global optimum (Bishop 2006, p. 325). Kernel methods in general and support vector machines in particular have been applied to spatial time series in a number of application areas including traffic flow and travel time forecasting (see, e.g., Haworth et al. 2014) and modeling exposure to ambient black carbon (Abu Awad et al. 2017).

3.3

Summary

In this section, the nonlinear, nonstationary properties of spatio-temporal data and their implications for space-time models were outlined. The question is which model should one choose for a given STDM task? The answer to this depends on the data. In the literature, space-time analysis has historically been applied to data with low spatial and/or temporal resolution. In the tradition of spatial analysis, the use of such data is to elicit causal relationships between variables that can give some valuable insights into the underlying processes. In this case, the use of parametric statistical models may be preferable because of their explanatory power and interpretability.

Spatio-temporal Data Mining

1701

However, these days, more and more data sources are becoming available in (near) real time at high spatial and temporal resolutions. Extracting meaningful relationships from such data is a task that is secondary to forecasting, and machine learning approaches, with their greater flexibility, will play an ever-increasing role. Generally, machine learning methods have a wider field of application than traditional geostatistics due to their ability to deal with multidimensional nonlinear data. They are also well suited to dealing with large databases and long periods of observation. In particular, support vector machines appear to be favorable because they avoid the curse of dimensionality faced by other methods. Currently, deep learning is also gaining momentum in STDM. One of the future research directions in this area lies in improving the interpretability of the structure and output of machine learning algorithms. Another way is to use a hybrid framework that combines statistical and machine learning approaches (Cheng et al. 2011).

4

Space-Time Clustering

Clustering is an unsupervised learning task that involves grouping unlabeled objects that share similar characteristics. In general, the goal is to maximize the difference between clusters while simultaneously maximizing the similarity within classes. There are five main types of clustering algorithm: partitioning, model-based, hierarchical, density-based, and grid-based. In addition to these, statistical tests can also be used to detect clusters. These are discussed in the following subsections.

4.1

Partitioning and Model-Based Clustering

Perhaps the most well-known clustering algorithm, k-means, is a partitioning-based algorithm. This type works well for attribute clustering, with many applications in geo-demographics, but is less suitable for identifying interesting spatio-temporal structures such as hotspots because each point must be assigned to a cluster. Modelbased approaches, such as Gaussian mixture models, are similar to partitioning approaches but have the benefit of assigning probabilities of class membership.

4.2

Hierarchical Clustering

Hierarchical clustering is a tree-based method that is related to decision tree approaches. Like partitioning and model-based clustering, hierarchical clustering is effective in clustering attributes. An advantage over the former two methods is that the tree can be cut at different levels to provide different numbers of clusters. However, it also has the drawback that every point must be assigned to a cluster, which limits its application to finding structure in the arrangement of spatio-temporal data.

1702

4.3

T. Cheng et al.

Density- and Grid-Based Clustering

Density-based clustering methods are particularly suited to spatio-temporal data because they can identify clusters of arbitrary shape and duration without the need to specify the number of clusters sought. They work by connecting points that are located close to one another as clusters; disconnected points are considered to be noise or outliers. ST-DBSCAN (spatio-temporal densitybased spatial clustering of applications with noise) is a notable example of density-based clustering (Birant and Kut 2007). OPTICS (ordering points to identify the clustering structure) is a related approach that enables clusters of varying density to be found, which allows the multi-scale issue of spatiotemporal data to be tackled. Although most applications use point datasets, density-based methods can also be applied to other spatio-temporal data types such as trajectories. Grid-based clustering is related to density-based clustering but involves first partitioning the spatio-temporal region into a multidimensional grid in which locally dense regions are sought. An example of this is ST-GRID (Wang et al. 2006). Gridbased approaches are computationally more efficient than density-based approaches but are dependent on the definition of the grid.

4.4

Statistical Tests for Clustering

There are various statistical tests for clustering in spatio-temporal data, such as the spatio-temporal Ripley’s K function, but most of these are global indices that do not identify the spatio-temporal location of clusters. A notable exception is space-time scan statistics, which is a statistical test for clustering that was originally devised to detect disease outbreaks (Neill 2009). The method has since been applied to crime hotspot detection (Cheng and Adepeju 2014), among others. The goal of space-time scan statistics is to automatically detect spatio-temporal regions that are anomalous, unexpected, or otherwise interesting. Spatial and temporal proximities are explored by scanning the study region via overlapping cylindrical or rectangular space-time regions of varying sizes. The observed value of a space-time region is compared with its expected value based upon historical data. Statistical significance is evaluated using Monte Carlo simulation. If the space-time region is found to be significant at this stage, then a significant cluster is found (Neill 2009). Space-time scan statistics have the significant drawback that the entire study region has to be scanned at all space-time region sizes, which is computationally intensive and limits scalability. The assumption that a cluster is a regular geometrical shape is also not realistic (e.g., disease might have spread via the river, thus affecting the people along the river’s course) and remains a limitation of the method. This problem could be tackled by generating irregularly shaped space-time regions or applying the method to other spatial configurations such as networks.

Spatio-temporal Data Mining

4.5

1703

Summary

This section introduced methods for spatio-temporal cluster detection. Depending on the clustering task, different methods are more appropriate. Partitioning, model-based, and hierarchical approaches are most suited to attribute data, while statistical, density-based, and grid-based methods can identify the spatio-temporal location of clusters. Once clusters are found, it is the task of the practitioner to assign meaning to them.

5

Spatio-temporal Visualization

Mining interesting patterns, rules, and structures from spatio-temporal data is only part of the task of STDM. The results are not useful if they are not easily understood. For instance, finding a spatio-temporal cluster in a patient register dataset is not useful in itself. On the other hand, confirming this spatio-temporal cluster as a disease outbreak and visualizing it using a platform that epidemiologists and medical professionals can understand is very useful indeed. As a result, space-time visualization has emerged as another important facet of STDM. However, visualizing spatio-temporal data is an inherently difficult task because representing time on a map is challenging. Geographic visualization, often termed geovisualization, enhances traditional cartography by providing dynamic and interactive maps. Many new techniques for visualizing time on maps have been proposed. These techniques can be divided into three broad types: (static) 2D and 3D maps and animations (see the chapter ▶ “Geovisualization” in this handbook). Here we also cover the process of using visualization as an integral part of the STDM process, which is termed geovisual analytics.

5.1

Two-Dimensional Maps

There are various ways to represent time on static two-dimensional (2D) maps, either as a single static map or multiple snap shots. Since all time steps are shown at the same time, the map-reader doesn’t need to retain events temporarily in mind, which prevents critical information being missed. However, 2D maps can only present a few time steps at a time due to the limitation of the available map media (computer screen, paper, etc.). This means that 2D mapping approaches must make some sacrifices in terms of the information they display. Intelligent use of visual variables (colors, sizes, texture) can add additional information to 2D maps. The classic example of this technique is Minard’s map, which shows Napoleon’s doomed campaign to Moscow in 1869. Time was displayed as an axis on the map (parallel to the axis of the geographical position), and the number of remaining soldiers was shown by the thickness of the lines. Monmonier (1990) presented a range of methods for visualizing time on maps, including dance maps, chess maps, and change maps. Dance maps are used to

1704

T. Cheng et al.

visualize the movement of spatial objects by drawing movement paths or pinpoints of objects on a 2D plane. A chess map is a series of maps laid out in a chess board manner, with each map representing a single time slice. Interpretation of changes is left to the user. A change map shows changes or differences against a reference time period, as an absolute value or percentage. Change maps are effective for representing quantitative attributes as users do not have to calculate the amount of change by themselves. An alternative is to place charts on maps to show attribute change (Andrienko et al. 2007). This approach has the drawback that the map display can become overcrowded. Some approaches, known as cartograms, transform or distort geographic space in order to enhance information communication, with the London Underground map being a famous example. The spatial treemap is an example of a cartogram approach that divides geographic space into rectangular units within which temporal attribute information can be visualized (Wood and Dykes 2008). This technique allows the visualization of a large number of time points, but the user must relate the visualization to the underlying geography, which can be difficult. Cartograms can also be used to distort geographic space to represent times. For example, travel time on transportation networks can be represented as distances in map units. Alternatively, isochrones can be drawn on a map to represent lines of equal travel time from a given origin. Isochrones have the advantage that they do not distort the underlying geography.

5.2

Three-Dimensional Visualization

Three-dimensional (3D) visualization is a natural choice for spatio-temporal data because time can easily be represented as the third dimension, providing a simple way to show multiple times in a single view. The pioneering example of this is the space-time cube (Hägerstrand 1970). A space-time cube consists of two dimensions of geographic locations on a horizontal plane and a time dimension in the vertical plane (or axis). Space-time cubes can be used to visualize trajectories of objects in 3D, or “space-time paths.” The field of time geography has emerged from the study of trajectories in space-time cubes (Miller 2005). The space-time cube has two main limitations. Firstly, the 3D display makes it difficult to refer space-time paths to geo-locations and time. Secondly, the space-time cube has difficulty in displaying large amounts of data. However, interactive techniques and data aggregation can be used to reduce cluttering when displaying multiple objects. Alternatively, groups of objects can be represented as densities and visualized as isosurfaces. An isosurface is a three-dimensional analogy of an isoline that represents a surface of constant value within a volume of space. As well as density, isosurfaces can represent interpolated attributes such as travel times on a road network (Cheng et al. 2013). Isosurfaces work well in Euclidean space but are not as appropriate for network datasets. The 3D wall map is a 2D road map with an additional time dimension to display the change. Each layer represents the situation at a time, with color representing the attribute value (Cheng et al. 2013; Fig. 3).

Spatio-temporal Data Mining

1705

Fig. 3 Wall map of travel delay (min/km) of outbound roads during the morning peak on 5, 12, 19, 26 October 2009 (Cheng et al. 2013)

5.3

Animated Maps

With improvements in computing power and Internet technology in recent decades, animated maps have become a very active area of research and are now distributed widely on the Internet. The basic principle is to show a sequence of 2D maps, emphasizing dynamic variables such as duration, rate of change, order of change, frequency, display time, and synchronization (MacEachren et al. 2004). Weather maps and traffic maps are two of the many examples. Animated maps can use motion to emphasize key attributes by using, for example, blinking symbols to attract attention to a certain location on the map. The drawback of animated maps is that they do not incorporate interactivity, which limits their use for analysis purposes.

5.4

Visual Analytics: The Current Visualization Trend

Visual analytics is an outgrowth of the field of scientific and information visualization. It refers to “the science of analytical reasoning facilitated by interactive visual interfaces” (Thomas and Cook 2005). In the context of STDM, visual analytics is referred to as geovisual analytics. Geovisual analytics is an iterative process that integrates visualization at various stages of the STDM process, including information gathering, data pre-processing, knowledge representation, and decision making. Normally, unknown data are visualized in order to give a basic view, and then users will use their perception (intuition) to gain further insights from the images produced by visualization. The insights generated by are then transformed into knowledge. After users have gained certain knowledge, they can generate hypotheses that will be

1706

T. Cheng et al.

used to carry out further analysis using available data analysis and exploration techniques. The results from analytical process will be visualized for presentation and further gain in knowledge. In recent years, governments and businesses have made more of their data available through APIs (application programming interfaces). This has opened data sources to the public and enabled the process of visual analytics to be applied to streaming datasets in the form of dashboards and interactives. Compared with static 2D and 3D maps, dashboards are more effective in displaying and revealing spatio-temporal information due to their customization ability. Different aspects of the data can be displayed in different parts of the screen and dynamically linked to user input. Dashboards and interactives can be used primarily for visualization or facilitating the STDM workflow by producing interpretable outputs from massive datasets. Several toolkits can be used to build interactive dashboards. For example, Shiny is a package in R programming that allows users to create an interactive web-based dashboard simply with R scripts. Visualization modules such as interactive maps, plots, tables, and textboxes can be integrated and organized in one framework. Codefree applications and platforms are also available to visualize spatio-temporal information, such as Tableau (https://www.tableau.com/) and ArcGIS operational dashboards (https://www.arcgis.com/). See the Chapter ▶ “Web-Based Tools for Exploratory Spatial Data Analysis” by Linda See for web-based development in ESTDA.

5.5

Summary

Spatio-temporal visualization is an integral part of the STDM process. Advances in Internet technology and computing facilities have made visualization of large streaming datasets more accessible than ever. However, despite significant progress, visualizing large volumes of data in real time and making effective use of the third dimension remain a challenge. In particular, the results of complex analyses should also be visualized in a geovisual analytics workflow in order to add value to spatio-temporal data.

6

Conclusion

Since the concept of knowledge discovery from databases was proposed in 1988, tremendous progress has been made in data mining and spatial data mining (Miller and Han 2009; Li et al. 2016). STDM is only possible based upon the progress in those areas, along with GIS and geocomputation (see the chapter ▶ “Geographic Information Science” by Goodchild and Longley in this handbook). This chapter introduces the fundamentals of STDM, which consists of space-time prediction, clustering, and visualization. However, the field of STDM is far from mature, and further research is needed in the following areas: • New methods are needed for mining crowd-sourced data. This data is often extremely noisy, biased, and nonstationary. Examples include geotagged social

Spatio-temporal Data Mining









1707

media and trajectory data obtained from smartphones. This area is relevant to the recent development of citizen science, volunteered geographic information, and the Internet of Things in particular. Methods are required to extract meaningful patterns from individual sources and integrate them in complex network frameworks. In particular, the linkage between different networks such as transport, infrastructure, social networks, and the economy is a significant challenge. Using network frameworks, the interactions within and between systems can be considered in mining spatiotemporal patterns. This aspect is relevant to complexity theory and network dynamics in particular. STDM techniques are required for emergence and tipping point detection. For example, how do we find emergent patterns and tipping points in the performance of the economy or spread of a disease that can generate actionable insights. It is important to find outliers, but more important is finding the critical points before the system breaks down so that mitigating action can be taken to avoid the worst scenarios such as traffic congestion and epidemic transmission. This aspect is relevant to the latest development in system optimization and in reinforcement learning to improve the operation of systems. Another challenge of STDM is how to calibrate, explain, and validate the knowledge extracted. A good example of this is the calibration of spatial (or spatio-temporal) autocorrelation. Higher-order spatial autocorrelation models have been developed, but the pitfalls have also been found (LeSage and Pace 2011). Nonstationarity and autocorrelation are fundamental to our observation (or our empirical test) of reality; it is hard to prove that the higher-order autocorrelation comes from the first to the second, and then to the third, or from the first to the third directly, which makes the explanation unconvincing. Furthermore, validation is difficult – so far Monte Carlo simulation is the main tool for simulation, which is also based upon a statistical distributions. This makes machine learning more promising in future STDM. Technically, grid computation and cloud computation allow data mining to be implemented at multiple computer sources. But how to scale algorithms to larger datasets will always be a challenge for data mining given the increase of data volume is far quicker than the improvement in the performance of data processors. Online machine learning could be a potential solution on this.

Please notice that the content of this chapter is mainly around spatial data in point, networks, and grids, but not on remote sensing data, which is another broad area of research.

7

Cross-References

▶ Bayesian Spatial Analysis ▶ Exploratory Spatial Data Analysis ▶ Geographically Weighted Regression ▶ Geographic Information Science

1708

T. Cheng et al.

▶ Geospatial Analysis and Geocomputation: Concepts and Modeling Tools ▶ Geostatistical Models and Spatial Interpolation ▶ Geovisualization ▶ Spatial Autocorrelation and Moran Eigenvector Spatial Filtering ▶ Spatial Clustering and Autocorrelation of Health Events ▶ Spatial Dynamics and Space-Time Data Analysis ▶ Spatial Panel Models and Common Factors ▶ Web-Based Tools for Exploratory Spatial Data Analysis

References Abu Awad Y, Koutrakis P, Coull BA, Schwartz J (2017) A spatio-temporal prediction model based on support vector machine regression: Ambient Black Carbon in three New England States. Environ Res 159:427–434 Andrienko G, Andrienko N, Jankowski P, Keim D, Kraak M-J, MacEachren A, Wrobel S (2007) Geovisual analytics for spatial decision support: setting the research agenda. Int J Geogr Inf Sci 21(8):839–857 Anselin L (1988) Spatial econometrics: methods and models. Springer, Dordrecht Atluri G, Karpatne A, Kumar V (2018) Spatio-temporal data mining: a survey of problems and methods. ACM Comput Surv 51(4):83:1–83:41 Birant D, Kut A (2007) ST-DBSCAN: an algorithm for clustering spatial–temporal data. Data Knowl Eng 60(1):208–221 Bishop C (2006) Pattern recognition and machine learning. Springer, New York Cheng T, Adepeju M (2014) Modifiable temporal unit problem (MTUP) and its effect on spacetime cluster detection. PLoS ONE 9(6):e100465 Cheng T, Wang J, Li X (2011) A hybrid framework for space–time modeling of environmental data. 环境数据时空建模的混合框架. Geogr Anal 43(2):188–210 Cheng T, Haworth J, Wang J (2012) Spatio-temporal autocorrelation of road network data. J Geogr Syst 14(4):389–413 Cheng T, Tanaksaranond G, Brunsdon C, Haworth J (2013) Exploratory visualisation of congestion evolutions on urban transport networks. Transp Res C Emerg Technol 36:296–306 Cheng T, Wang J, Haworth J, Heydecker B, Chow A (2014) A dynamic spatial weight matrix and localized space–time autoregressive integrated moving average for network modeling. Geogr Anal 46(1):75–97 Elhorst JP (2010) Spatial panel data models. In: Fischer MM, Getis A (eds) Handbook of applied spatial analysis. Software, tools, methods and applications. Springer, Berlin/Heidelberg, pp 172–192 Fischer MM (2015) Neural networks. A class of flexible non-linear models for regression and classification. In: Karlsson C, Andersson M, Norman T (eds) Handbook of research methods and applications in economic geography. Elgar, Cheltenham, pp 172–192 Fotheringham AS, Crespo R, Yao J (2015) Geographical and temporal weighted regression (GTWR). Geogr Anal 47(4):431–452 González JA, Rodríguez-Cortés FJ, Cronie O, Mateu J (2016) Spatio-temporal point process statistics: a review. Spat Stat 18(Part B):505–544 Hägerstrand T (1970) What about people in regional science? Pap Reg Sci 24(1):7–24 Haworth J, Shawe-Taylor J, Cheng T, Wang J (2014) Local online kernel ridge regression for forecasting of urban travel times. Transp Res Part C Emerg Technol 46:151–178 Heuvelink GBM, Pebesma E, Gräler B (2015) Space-time geostatistics. In: Shekhar S, Xiong H, Zhou X (eds) Encyclopedia of GIS. Springer, Cham, pp 1–7

Spatio-temporal Data Mining

1709

Huang B, Wu BM, Barry M (2010) Geographically and temporally weighted regression for modeling spatio-temporal variation in house prices. Int J Geogr Inf Sci 24(3):383–401 Kanevski M, Timonin V, Pozdnukhov A (2009) Machine learning for spatial environmental data: theory, applications, and software, Har/Cdr. EFPL Press, Lausanne LeSage JP, Pace RK (2011) Pitfalls in higher order model extensions of basic spatial regression methodology. Rev Reg Stud 41(1):13–26 Li S, Dragicevic S, Castro FA, Sester M, Winter S, Coltekin A, Pettit C, Jiang B, Haworth J, Stein A, Cheng T (2016) Geospatial big data handling theory and methods: a review and research challenges. ISPRS J Photogramm Remote Sens 115:119–133 MacEachren AM, Gahegan M, Pike W, Brewer I, Cai G, Lengerich E, Hardisty F (2004) Geovisualization for knowledge construction and decision support. IEEE Comput Graph Appl 24 (1):13–17 Miller HJ (2005) A measurement theory for time geography. Geogr Anal 37(1):17–45 Miller HJ, Han J (2009) Geographic data mining and knowledge discovery, second edition. CRC Press, Boca Raton Monmonier M (1990) Strategies for the visualization of geographic time-series data. Cartographica 27(1):30–45 Neill DB (2009) Expectation-based scan statistics for monitoring spatial time series data. Int J Forecast 25(2009):498–517 Pfeifer PE, Deutsch SJ (1980) A three-stage iterative procedure for space-time modelling. Technometrics 22(1):35–47 Pfeifer PE, Deutsch SJ (1981) Variance of the sample space-time autocorrelation function. J R Stat Soc Ser B Methodol 43(1):28–33 Shekhar S, Jiang Z, Ali RY et al (2015) Spatiotemporal data mining: a computational perspective. ISPRS Int J Geo-Inf 4(4):2306–2338 Thomas JJ, Cook KA (2005) Illuminating the path: the research and development agenda for visual analytics. National Visualization and Analytics Center, Lausanne. https://ils.unc.edu/courses/ 2017_fall/inls641_001/books/RD_Agenda_VisualAnalytics.pdf Wang M, Wang A, Li A (2006) Mining spatial-temporal clusters from geo-databases. In: Li X, Zaïane OR, Li Z (eds) Advanced data mining and applications. Springer, Berlin/Heidelberg, pp 263–270 Wood J, Dykes J (2008) Spatially ordered treemaps. IEEE Trans Vis Comput Graph 14 (6):1348–1355

Scale, Aggregation, and the Modifiable Areal Unit Problem David Manley

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 MAUP Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 The Scale Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The Zonation Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Approaches to Understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 From Univariate Statistics to Spatial Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The Importance of Spatial Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Exploring the MAUP Through Zone Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1712 1713 1713 1714 1716 1716 1719 1721 1723 1724 1724

Abstract

The modifiable areal unit problem (MAUP) is a serious analytical issue for research using spatial data. The MAUP manifests itself through the apparent instability of statistical results derived from alternative aggregations of the same data. These results are conditional on two facets of aggregation – the spatial scale at which the units are measured, and the configuration of the units used at that scale. The uncertainty caused by the MAUP impacts the robustness and reliability of statistical results. Although solutions have been proposed, none have been applicable in more than a handful of specific cases, although recent work has pointed to some of the underlying causes helping to further our understanding. This chapter charts the scale and zonation effects, and details the important role of spatial autocorrelation and spatial structures in understanding the processes that lead to the statistical uncertainty. The role of zone design as a tool to enhance

D. Manley (*) School of Geographical Sciences, University of Bristol, Bristol, UK e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_69

1711

1712

D. Manley

analysis is explored and reference made to analyses that have adopted explicit spatial frameworks. Keywords

Spatial autocorrelation · Spatial data · Scale effect · Areal unit · Spatial framework

1

Introduction

A serious problem for analysts using spatial data is that while the phenomena they are investigating may be continuous, the data available frequently are not. Unfortunately, there are no spatial equivalents to the day, month or year and the areal units used to present the continuous data are arbitrary compromises often designed to suit a wide range of uses. Consequently, the results of statistical analysis of individual data that has been aggregated into areal units vary across a wide range of measures. This problem is known as the modifiable areal unit problem (MAUP), and it has vexed users of aggregate data for many decades. Yet, we have neither a full and detailed understanding of the problem nor the underlying causes. Openshaw (1984) concluded that it is unlikely that an analytical solution to the MAUP will ever be realized due to the wide range of possibilities that arise when the partitioning of continuous space is implemented as well as the wide range of analytical tasks that aggregated data are required to perform (see also Wong 2009). Where solutions have been proposed, they have only worked in a couple of specific instances rather than being generalizable. Instead, the MAUP needs to be accounted for clearly in the research hypothesis that precedes analysis. In the twenty-first century, spatial data are an increasingly important factor in everyday life. Almost all countries in the developed world collect and publish data using administrative boundary systems – areal units. In the United Kingdom, the decennial population Census is published using small, low-level areal units, and the smallest areal units of the British Census were designed explicitly drawing on the principles of the MAUP, promoting, amongst other things, internal homogeneity across a range of important indicators such as housing tenure. By contrast, the problem of the MAUP is magnified by the often temporary nature of the areal units and the frequent revisions that are made to the coverages to reflect changes in population data. However, despite the prevalence of the MAUP in spatial data, it is an issue that is all too frequently ignored or neglected in geographical analysis. A search in Google Scholar on the term “modifiable areal unit problem” reveals only 9,400 publications, a low number when you consider the number of papers that mention spatial data explicitly (around 4.7 million alone, and this does not include those papers using aggregate spatial data but referring to it in another way!). The lack of attention paid to the MAUP has, perhaps, two underlying causes. Firstly, the readily available nature of many areal unit systems means that most of the research using aggregate

Scale, Aggregation, and the Modifiable Areal Unit Problem

1713

data adopts areal boundaries that are generated a priori and an engagement with the creation of areal units is not required. Secondly, the results of many quantitative studies that employ aggregate data of one sort or another rely on the implicit assumption that the MAUP isn’t a significant problem in order to present valid results. To acknowledge the MAUP, even informally raises questions on the validity of the analysis conducted and any conclusions reached. Openshaw’s conclusion from almost three decades ago remains as pertinent today as it was when he wrote it: “this is hardly a satisfactory basis for the application and further development of spatial analysis techniques in geography” (1984, p. 5). This chapter explores the problem of the MAUP in the context of spatial data analysis, outlining the two major aspects of the problem, the scale effect and the zonation effect. Definitions are provided for both these aspects, and examples are drawn from the literature to illustrate the problems. Following these two sections, an overview of the evidence relating to the MAUP is provided.

2

MAUP Definitions

There are two aspects to the MAUP known as the scale effect and the zonation effect (also called the aggregation effect in some literature, see for instance Openshaw (1977), but as the process of aggregation is involved in both scale and zonation decisions, we make the distinction using the term zonation. This section outlines the two aspects with reference to relevant examples and provides the context for a discussion around empirical results in the following sections.

2.1

The Scale Effect

The scale effect occurs because of the nested hierarchies within which human society is arranged and is expressed through the task of choosing the most appropriate scale for analysis (Arbia 1989) (Fig. 1). It is often considered that there are multiple scales at which an analysis could (theoretically) be undertaken. Drawing on the United

Fig. 1 The scale problem: the three different scales could represent (a) output areas, (b) wards, and (c) districts

1714

D. Manley

Kingdom Census as an example, output areas (OAs, typically 3–400 individuals) form the basic spatial units and can be aggregated into higher-level spatial units, such as wards (usually a couple of 1,000 individuals) and districts (many 100,000s of individuals). The classic example of the scale effect was published by Gehlke and Biehl (1934) and used three different datasets including random coin tosses, census data, and experimental groups of rural counties drawn from the United States. Using correlation analysis, they demonstrated that the coefficients between census data reporting juvenile delinquency and monthly house rentals tended to increase as the number of areal units representing the data decreased. While the Census data may be susceptible to structures within the data that cannot be observed, which in turn cause the instability of the statistical results, the coin toss data demonstrated that correlation coefficients changed even when the underlying data were effectively generated randomly, and each data unit was independent of all others. From their analysis, Gehlke and Biehl concluded by questioning if “a geographical area is an entity possessing traits, or merely one characteristic of a trait itself” (p. 170). Ultimately, they urge caution in the treatment of data from areal units and “that variations in the size of the correlation coefficient seemed conditioned on the changes in the size of the unit used” (op.cit.). To further explore the scale effect, Kirby and Taylor (1976) use data on voting patterns to illustrate the potential pitfalls and identify pockets of the population who vote differently to the overall outcome for an area. The implication of this finding is that it is possible by conducting analyses at different scales (or zonation configurations) to produce different aggregate results from a single unvarying individual distribution. Kirby and Taylor also discuss the dilemma of choice of scale: at a scale that is too small, then it is not possible to compare data sources from different (modifiable) unit systems and the subsequent loss of detail may obscure some of the variation in the data. The scale effect has, therefore, several different elements, including the enhancing or smoothing of spatial processes, akin to the statistical smoothing of data to remove noise, and the arbitrary nature of the number of units that should be used to subdivide a space.

2.2

The Zonation Effect

Once the spatial scale has been determined, we can then consider how the space is to be divided up – the zonation effect. The zonation effect occurs where there are “any variations in results due to alternative units of analysis where [. . .] the number of units, is constant” (Openshaw and Taylor 1979). The extent of the problem can be established when you consider that there are, potentially, an infinite number of different ways in which a continuous space can be subdivided into discrete areal units. A diagrammatic interpretation of the zonation problem is presented in Fig. 2. Even when the initial starting state of the space is constrained by a set of predetermined units the issue is nontrivial as emphasized by Openshaw (1984), noting

Scale, Aggregation, and the Modifiable Areal Unit Problem

1715

Fig. 2 The zonation problem. Each of these diagrams demonstrates a division of a sample space into 36 distinct areal units, yet each could potentially yield different results

Table 1 Correlation coefficients from Openshaw and Taylor (1979, p. 129) showing zonation effect (adapted)

Number of areal units Six republican-proposed Six democratic-proposed Six congressional districts Six urban/rural regional types Six functional regions

Correlation coefficient (r) 0.482 0.627 0.265 0.862 0.713

that even a relatively small set of zones – such as moving from 1,000 small units to just 20 larger ones produced 10 1260 unique combinations! For Openshaw (1984) the zonation effect was the greater of the two aspects of the MAUP, as there is considerably more freedom choosing the delineation of boundaries than in choosing the number of zones required. As a consequence, “the process of zonation becomes susceptible to the whims of those involved in the overall aggregation process” (Openshaw and Taylor 1981, p. 61). While this position may be extreme, it makes the point that there are serious problems with the arbitrary nature of the many areal units. Openshaw and Taylor (1979, 1981) conducted one of the largest investigations into the MAUP. Replicating the earlier work of Gehlke and Biehl, they used correlation analysis to assess the instability of statistical analysis because of the MAUP. In the first instance, they correlated the proportion of republican voters against the percentage of the population above 65, using the 1970 US Census. To assess the impact of the zonation effect, Openshaw and Taylor produced correlation coefficients for multiple arrangements of counties in the state of Iowa. They set the scale constant each time aggregating the base units into six counties. Table 1 reports the results of their analysis. Openshaw and Taylor (1979) demonstrated that it was possible to obtain highly changeable correlation coefficients for a single set of data. They went further than this in the article by attempting to describe the universe of correlation coefficients that were possible to achieve using the different scales of zonation. For many of the scales, they claim that the theoretical range of coefficient was from 0.999 to 0.999.

1716

D. Manley

However, this was rarely the case for many of the zonation systems that they devised. For instance, using 72 zones the minimum found was 0.579, and the maximum was 0.927. This demonstrates the impact of the zonation effect as differing boundary choices change the correlation coefficient values. This work led to the conclusion that the MAUP (scale and zonation effects) were largely random and unpredictable. However, this position, as outlined below, is unlikely to be the case.

3

Approaches to Understanding

There is a vast body of research that has sought to gain greater understanding about the MAUP and how it can impact the results of statistical analysis. This section reviews work that has sought to unpick how the MAUP can lead to different results in statistical analysis. Starting with the simple examples of uni- and bivariate analyses, evidence showed the potential severity of the MAUP. The literature built on this by examining the impact of the MAUP in models that are more explicitly spatial in configuration. Attention, then shifted to the role of spatial autocorrelation and spatial cross-correlation, two of the fundamental processes that lie behind incidence of the MAUP. Finally, work explored the process of zonation, (through zone design).

3.1

From Univariate Statistics to Spatial Models

There are many examples of investigations into univariate and bivariate parameter instability as a consequence of the MAUP. In a recent article that clearly demonstrates the importance of preserving the availability of small area estimates for understanding societal processes, Flowerdew (2011) took 2001 Census data for England and presented an investigation on the severity of the MAUP. However, the differences in results do not tell the whole story, as they may not be statistically different: evidence of difference is not different evidence. Developing this theme and 18 common variables, Flowerdew demonstrates that even just using the three standard spatial scales that the data are released at leads to results with significant statistical differences. Flowerdew uses the Fisher transformation to standardize the correlation coefficients and concludes that after standardization the MAUP effect leads to different results in around 60% of the cases. In general, under increasing scale aggregation, the increase in correlation coefficient is a consequence of the data smoothing properties associated with the aggregation process. As such, the variation between variables tends to decrease as aggregation increases – the heterogeneity between units will fall as greater number of the population are combined into single entities and the heterogeneity within units increases. There are fewer examples of investigations into the MAUP using multivariate analysis. A prime example investigating the MAUP using multivariate analysis used American Census data and demonstrated the problem with a regression model that related mean family income for multiple unit configurations at various spatial scales

Scale, Aggregation, and the Modifiable Areal Unit Problem

1717

(Fotheringham and Wong 1991). As would be expected, this analysis demonstrated that different spatial scales lead to systematic variability in the outcome of the regression analysis so that for some parameters (percentage elderly and percentage blue collar workers) the relationship to mean household income becomes more negative. Conversely, other parameters (percentage of home owners and the percentage of black residents) become systematically less negative as aggregation increases. Again, the importance of the MAUP is demonstrated by additionally determining where parameters change significantly and concluding that there are “many places[. . .] where the parameter estimates are significantly different” (Fotheringham and Wong 1991, p. 1035). Although the models above consider the MAUP, none of them explicitly incorporate the spatial structure of the data. Moving beyond these aspatial models requires a more complex modeling strategy and the explicit adoption of a spatial framework. An example of a spatial investigation into the MAUP has been conducted by Fischer and colleagues. In their work, they investigate what they term as the scale hypothesis and the aggregation hypothesis (the scale effect and the zonation effect in other words) with respect to the supply of labor in multiregional labor markets (Baumann et al. 1983). Adopting a standard MAUP approach, they suggest that the way in which the model of labor supply is measured through participation rates and commuting flows may be affected by the scale at which an analysis is conducted and the regions through which the multiple labor markets are realized. In their findings, Fischer and colleagues present a number of outcomes: firstly, in terms of determining labor participation (the number of males and females in employment), the effects of scale are relatively small. Thus, there is little variation in the result as the spatial scale of the analysis is altered. However, in a model representing commuting patterns, the scale effects are much larger, a finding which intuitively makes sense as commuting is only realized in the framework when zone boundaries are crossed. Increasing the scale will, all other things being equal, reduce the number of boundaries and so the level of commuting. In summarizing their findings, Fischer and colleagues highlight that the spatial framework that is adopted for an analysis is crucial, and it is “by no means admissible to ignore possible effects of the choice of a spatial framework in spatial model building” (p. 67). Finally, they suggest that when seeking out the most appropriate spatial framework, a range of criteria including model R2, t-values, and a priori signs should be considered. Ultimately, this leads to the conclusion that the most appropriate spatial framework is one that leads to the greatest level of explanation and the best model performance overall. A major area of interest where the spatial organization of individuals within and between areas is in segregation research (see also Poulsen et al. 2011). Segregation is a highly spatial phenomenon, and there are many examples within the literature where spatial statistics have been used to attempt to understand the role that the definition of the areal units and the scale of analysis can have on the resulting measures. Wong’s investigation into segregation indices and the MAUP demonstrated that, in general, as the spatial resolution (scale) increases, the greater the degree of segregation identified (Wong 2003). As discussed, the scale process is akin to data smoothing, so that sharp differences between smaller units are removed.

1718

D. Manley

Thus, as the areal units become smaller, the potential level of homogeneity within the areal unit will increase because there are fewer individual data points represented within each unit (up until the level of the single individual atomistic unit beyond which it is no longer realistic to decompose and represent a perfectly homogenous social unit). Using multiple scales of aggregation, Wong demonstrates that different scales produce different results for the dissimilarity index, D (see Duncan and Duncan 1955). To understand the impact of the MAUP scale effect, Wong proposes that the index can be decomposed into regional and local effects and that the local level measure demonstrates the deviation of each unit from the global regional D value. The range of values achieved can give insight into how much each local unit influences the overall segregation pattern. High values identify areas that deviate from the global value, while lower values demonstrate congruence. One influence that Wong does not attempt to cover is the effect of zonation differences, but it is clear however that with a small extension it would be possible to use Wong’s methodology to assess the impact of altering the boundaries on the resulting segregation outcomes. Two further examples of the MAUP impacting on the results of spatial statistical analysis are provided by the health literature. The first study relates to the effects of the Dounreay Nuclear Power Plant on instances of childhood leukemia (Heasman et al. 1984). Initial work suggested that the incidence of leukemia was unusually high in close proximity to the Dounreay plant. To investigate if these were significant leukemia clusters, the Scottish Health Service analyzed data recording all incidences of cancer between 1968 and 1986. The initial results of the analysis showed a significant excess of cases in the Dounreay area. However, at the subsequent public inquiry a number of methodological weaknesses were identified amongst which was the issue of boundary definition. Wilkie (1986) provided details of the methodological problems which included the potential gerrymandering (manipulation) of both the radial distances used to describe the areas as well as the gerrymandering of the time-period studied. With tight boundaries around cancer points (a low spatial scale) mortality rates were increased, creating artificially high results because of the smaller population bases. Similarly, looking at a different time-period, either by cutting the time series data into different lengths or by curtailing the investigation at an earlier time point influenced the outcomes. Further problems arose from the presence of edge effects (when cases appear near the edge of the study space and so may be excluded) and irregularly shaped areal units used for the aggregation. Finally, the use of areal units to imprecisely locate individual incidence data introduced small errors which cumulatively could result in the erroneous generation of clusters where there were not any, or vice versa. In conclusion, the findings of the Dounreay analysis were difficult to evaluate robustly as the choice of radii and time periods for their study area “are arbitrary” (Wilkie 1986, p. 266). Ultimately, any clusters of cases in one area and time-period could be eliminated simply through an alternative choice of radii or time periods. The second of the health examples is provided by Odoi and colleagues (2003). They were investigating the impact of the MAUP on the spatial distribution of

Scale, Aggregation, and the Modifiable Areal Unit Problem

1719

human giardiasis (a parasitic infection causing diarrhea) in Canada. The study set out to explicitly examine the impact of alternative spatial scales on the identification of infection clusters and whether the most appropriate statistical framework for assessing the clustering was using global or local statistics. Their analysis demonstrated that using a fine spatial scale with relatively small units enabled the detection of clusters that were hidden at the higher spatial scale. They also identified that local statistical measures provided more clustering detail than the global measures and as such were more appropriate for the exploratory analysis of patterns in spatial data.

3.2

The Importance of Spatial Autocorrelation

Tobler’s First Law of Geography states that all things are related, but near objects are more related than distant objects (Tobler 1970). More formally, the degree of similarity between two or more spatially collocated objects is known as spatial autocorrelation (see Getis 2010 for a comprehensive review). Cliff and Ord (1981) make the link between spatial autocorrelation and the MAUP explicitly and note that the size of the cells in the areal unit system is important in determining the strength of the spatial autocorrelation. All other things being equal, larger areal units will have lower levels of autocorrelation than smaller ones. Thus, at different spatial scales, different patterns and degrees of spatial autocorrelation will be present and will impact on the structure of the data that are being analyzed. Returning to the work of Fotheringham and Wong (1991) after assessing the significance of the changes in parameter estimates, they investigated whether there was a link between these changes and spatial autocorrelation in the variables analyzed. Their conclusion was that there was little link, a conclusion illustrated using the examples of the percentage of black individuals and the percentage of home owners. The regression parameters for these variables behaved very similarly under aggregation in terms of the significant change magnitude but that possessed very different spatial autocorrelation structures. The work of Flowerdew and Green (1994) provides a way into understanding the properties of data with spatial autocorrelation. Using simulated data, they explore the outcomes of multiple realizations of areal units at a given scale. The use of simulated data was important as it enabled them to analyze data with known spatial autocorrelation properties in comparison with real data where spatial autocorrelations are unknown and may also be impacted by other (unmeasured) biases. Green and Flowerdew (1996) aggregated their basic grid of raw simulated data into new areal units in three ways: (a) randomly; (b) systematically, based on the value of one of the simulated variables; and (c) spatially, by combining spatially contiguous blocks. The new zones that were constructed aspatially with random aggregation show no change in the subsequent correlation or regression outcomes; the systematic aggregation increases the correlation coefficient but has no effect on the regression parameter, while spatial aggregation alters the coefficients of both. In conclusion, they argue that the effects of spatial autocorrelation may “result from contiguous processes affecting the distribution of one or more of the variables being analyzed, or

1720

D. Manley

the spatial distribution of other variables which have effects on these.” This explicitly expresses the realization that the variables of areal units may display linked characteristics. Developing their work on spatial autocorrelation further, Green and Flowerdew (1996) and Flowerdew and colleagues (2001) extend the analysis to consider the impact of spatial autocorrelation between variables as well as within variables, using the “cross-correlation.” They define cross-correlation as the relationship not only between variable X and variable Y at a specific point in space but also being between X and Y at neighboring points in space. In Green and Flowerdew (1996), they continue using the simulated data, but this time aggregated into spatially contiguous zones. They then model the relationship between the simulated X and Y firstly using a standard regression model and then using a model that incorporates the simulated cross-correlation between X and Y. Green and Flowerdew call the cross-correlation a regional effect, and they introduce a regional term into the regression model so that there is a regression coefficient for the local effect and a regression term for the regional effect. Having used simulated data for an initial exploration, attention is then turned to repeating the analysis with real data derived from the UK population census. Using unemployment and ethnicity, these authors find evidence that confirms their cross-correlation hypothesis and demonstrates the usefulness of the local and regional regression approaches. In Flowerdew and colleagues (2001) they illustrate the same concept using Fotheringham and Wong’s example (1991, see above). They theorize that cross-correlation can occur because the relationship between the “attractiveness of housing (and hence its value and the likely income of the residents) may depend not just on race and class in the immediate vicinity but also on such characteristics in neighboring areas” (Flowerdew et al. 2001, p. 91). The results demonstrate that while the presence of spatial autocorrelation is important in determining the incidence of the scale effect in correlation coefficients, it does not impact the regression coefficients. In order for the regression coefficients to change there needs to be cross-correlation present between the X and Y variables. Arbia (1989) introduced the term “systematic spatial variation” to create a formal framework to understand the relationship between the MAUP and spatial autocorrelation using Cliff and Ord’s work (1981) as a starting point. Using data relating to the residential location of population organized on a 32 by 32 lattice, Arbia simulated the MAUP by aggregating the grid into combinations of 16 by 16, 8 by 8, 4 by 4, and 2 by 2. The results of the investigation demonstrate that with aggregation there is an increase in the level of variance and that as the level of aggregation increases, the estimates of the variance of the data become more unreliable as the number of observations diminishes with fewer degrees of freedom. Arbia concluded the effects of the MAUP under aggregation were the result of the relationships between near objects. Building on this finding, Manley and colleagues (2006) demonstrate that spatial autocorrelation structures rarely match the boundaries of the zones used to represent the data and that these differences between the spatial extent of the autocorrelation is, in part, one of the causes of the MAUP. Over time, more complex models were applied to the MAUP. For instance, Amrhein and Flowerdew (1989) investigated the effects of MAUP in relation to

Scale, Aggregation, and the Modifiable Areal Unit Problem

1721

Poisson regression. The results of their analysis demonstrated that within the Poisson model there is little zonation effect to be found. However, this is not a cause for celebration by the spatial analyst because a methodology to overcome the MAUP has been identified: the lack of effect is the consequence of the analytical technique, not because the results are free from the MAUP. The finding of Amrhein and Flowerdew is important because they add a new dimension to the MAUP discussion. They demonstrate that the choice of model for an analysis is just as critical as the zonation and scale choice itself. This conclusion does not, however, mean that the world of the analyst dealing with spatial data is bleak as might initially be presumed. The notion that there are spatial structures in the data which may, or may not, coincide with the boundaries used is an important one. Recent work by Jones and colleagues (2018) takes this idea and assumes that the MAUP occurs because the model being used is mis-specified. As Flowerdew and Green demonstrated, when the data are aggregated randomly there is no MAUP, thus the altering of results must be a function of the spatiality of the data. By adopting this modeling and information-based approach Jones and colleagues hypothesize that, if the spatial structure was known – and built in to the model – then even under spatial aggregation the MAUP would cease to be an issue. Using simulated Census data reflecting the overall population structure and Census geography of Leicester and modeling the data within a Multilevel Modeling framework they demonstrate that this is the case. Moreover, once we acknowledge that the spatial structure is critical then the idea that the MAUP is random and unpredictable is demonstrated to be a fallacy: recover the spatial structure and it ceases to be an empirical issue. Thus, the MAUP also becomes a tool for discovering information about the spatial structure of processes acting on the data.

3.3

Exploring the MAUP Through Zone Design

A cursory overview of the statistical investigations into the MAUP would suggest that the vast majority of effort into explaining the MAUP has been concerned with the scale effect. In fact, the zonation issue has also been tackled extensively, and in some regards, with more success than the scale issue. The zonation issue research has largely focused on two aspects: how can zonations be created that are appropriate to the analytical task and what are the properties of zonation that lead to the MAUP occurring. The ability to provide multiple realizations of zonal systems within one analysis space enables the scale effect to be investigated further, as many different zonations can be derived as scale changes. If zoning systems are problematic, then it is useful to consider why and how zoning systems may be (re)designed. The rationale behind is summed up by Openshaw and Rao (1995): “[t]he new opportunity provided by [the increasing availability of digital] boundaries is not to demonstrate the universality of MAUP effects, or to manipulate results by gerrymandering the spatial aggregation used, but it is to design new zoning systems that may help users recover from MAUP.” Openshaw (1978) presented two extremes of the approaches to zone design as an

1722

D. Manley

illustration of the problem. A conventional statistical approach within which spatially aggregated data can be viewed as fixed, or a model that assumes that the “undefined parameters [are] fixed, and the identification of an appropriate zoning system has to be made in some optimal manner.” The first view is unacceptable geographically due to the interdependence between the choice of zones and results achieved. From a statistical standpoint, the second solution is as poor, as it serves to remove the comparability between studies. The process of zone design presents a compromise through the creation of the system that satisfies (or at least suffices) a set of criteria. One ideal outcome for a good zonal system would be a set of zones that was as simple as possible, homogenous (against a single or set of variables defined by the user), and compact. In contrast, Openshaw (1978) increased the complexity of the problem and suggested that shape (as distinct to compactness) and population size are also important elements to include. Depending on the task for which the zones are required, each of these criteria may be made more or less important. One of the first attempts at automated zone design was undertaken by Stan Openshaw (1978) with the Automatic Zoning Procedure, implemented in the Automatic Zoning Program (AZP). More recently, the process of zone design has become integrated with the mainstream Geographical Information Science (GISc) literature and enabled users to define their own zonal units. AZP was extended and became the Zone Design System (ZDES) and has been employed in a wide range of zonal scenarios. One prime example is explored in Openshaw and colleagues (1998) which commented on the first fully automated basic spatial unit (bsu) design process undertaken for the publication of the 2001 UK Census data. As Openshaw and colleagues point out, one of the major barriers to successful zone design is the realization that the problem is not one that can be tackled in the traditional software programming sense, where a global optimal solution is identifiable – if there was a global optimal solution, it is not clear how it would be identified, and in many cases there is no optimal solution. Rather, there is a range of suitable solutions which present sufficient solutions given the criteria that have been inputted. Other systems have been developed specifically for zonal data analysis and redesign. An alternative to ZDES, is AZM (Automated Zone Matching). AZM “[i] mplements zone design on a set of zones described by polygon and arc attribute tables exported from Arc/Info or generated by users’ own programs. [The program is designed to optimize] the match between two zonal systems, or the aggregation of a set of building block zones into output areas with a range of user-controlled design parameters” (Martin 2003). AZM uses the AZP procedure outlined by Openshaw (1978) and is conceptually similar. A key advantage of AZM is the ability to control the aggregation process with regard to shape, key variable homogeneity, and population size simultaneously so that it is suited to the design of analytically appropriate zonal systems. In other words, zonal systems that better reflect the required uses of data, as opposed to more unstructured aggregations where there is little or no control over one or all of these factors, are not relevant in the context of research where desired scales of aggregation are required. Finally, evidence of the potency of understanding zone design and exploiting it was presented by Boyle and Alvanides (2004). In the 1990s and 2000s money from

Scale, Aggregation, and the Modifiable Areal Unit Problem

1723

the European Commission designated as “Structural Funds” was available to urban areas with the highest levels of inner-city deprivation. Only the most deprived places received money and the higher up the ranking of the most deprived a city came the more money was available to them. Using a case study involving the City of Leeds and measures of deprivation they demonstrate that it is possible to change the ranking of Leeds relative to other cities across the UK by using different boundary systems. Using the 1998 Index of Local Deprivation (ILD) based on the 1991 Census, with units as published, Leeds appeared 56th out of 57 cities. However, by redrawing the boundaries using alternative population thresholds to define the city area, the ranking could be changed to 11th. Further adjustment using smaller geographies (Wards) rather than the large local authority districts, enabled a further change in the ranking, and Leeds was the 3rd most deprived city in England. Crucially, the initial ranking of 56th would not have secured European funding while the final ranking (3rd) would ensure a large flow of money into the city. Both of these examples highlight the potential difficulties, opportunities, and concerns that research using aggregated data should address.

4

Conclusion

This chapter has provided an overview of the modifiable areal unit problem (MAUP). With the growth of geocoded data, the potential for analysts to be confronted with areal units is increasing. Knowledge of the potential pitfalls of conducting analysis containing areal unit data is vital when dealing with areal unit data in analysis. This is true when the areal units are the objects of the analysis and when the areal unit data are included to provide context to other sorts of information. In many cases, it is important to acknowledge the presence of the MAUP in analysis while accepting that the results may be conditional on the scale and zonation scheme employed. Previous research has suggested that the MAUP is unpredictable and cannot be solved. Previous research has also demonstrated that spatial autocorrelation and cross-correlation are likely to be very important in understanding the degree and severity of the MAUP. More recently, work has determined that the MAUP is related to spatial structures within the data and that if these structures can be identified and included in the analysis then the MAUP will cease to be an issue. To regard the MAUP as something to solve or ignore is to deny the geographic nature of the data. As such, it remains a key topic that the analyst using aggregate data should be aware of and acknowledge in their analysis. When dealing with spatially organized data, the analyst must adopt a geographically informed process of hypothesis formation. Analytical scale should become a primary factor that is explicitly considered rather than an issue that is implicitly dealt with or assumed away in the name of pragmatism, as it is all too frequently. Often, this will require the analyst to adopt an approach whereby multiple scales of measurement and analysis should be considered, or a highly rigorous spatial framework for an analysis constructed. This chapter is all too brief to provide a comprehensive view of all the work that has been conducted into the MAUP. Nevertheless, it hopefully sheds sufficient light on the

1724

D. Manley

subject and processes to provide the reader with the means to adopt a more critical and nuanced approach to their analysis.

5

Cross-References

▶ Cross-Section Spatial Regression Models ▶ Spatial Clustering and Autocorrelation of Health Events ▶ Spatial Data and Spatial Statistics

References Amrhein C, Flowerdew R (1989) The effect of data aggregation on a Poisson regression model of Canadian migration. In: Goodchild MF, Gopal S (eds) Accuracy of spatial databases. Taylor and Francis, London, pp 229–238 Arbia G (1989) Spatial data configuration in statistical analysis of regional economic and related problems. Kluwer, Dordrecht Baumann J, Fischer MM, Schubert U (1983) A multiregional labour supply model for Austria: the effects of different regionalisations in labour market modelling. Pap Reg Sci Assoc 52 (1):214–218 Boyle P, Alvanides S (2004) Assessing deprivation in English inner city areas: making the case for EC funding for Leeds city. In: Clarke G, Stillwell J (eds) Applied GIS and spatial analysis. Wiley, Chichester, pp 111–136 Cliff AD, Ord JK (1981) Spatial processes: models & applications. Pion, London Duncan OD, Duncan B (1955) A methodological analysis of segregation indices. Am Sociol Rev 20:210–217 Flowerdew R (2011) How serious is the modifiable areal unit problem for analysis of English census data? Popul Trends 145 (Autumn). Office for National Statistics, 1–13 Flowerdew R, Green M (1994) Areal interpolation and types of data. In: Fotheringham S, Rogerson P (eds) Spatial analysis and GIS. Taylor and Francis, London, pp 121–145 Flowerdew R, Geddes A, Green M (2001) Behaviour of regression models under random aggregation. In: Tate NJ, Atkinson PM (eds) Modelling scale in geographical information science. Wiley, Chichester, pp 89–104 Fotheringham AS, Wong D (1991) The modifiable areal unit problem in multivariate statistical analysis. Environ Plann A 23(7):1025–1044 Gehlke CE, Biehl K (1934) Certain effects of grouping upon the size of the correlation in census tract material. J Am Stat Assoc 29(Special Suppl):169–170 Getis A (2010) Spatial autocorrelation. In: Fischer MM, Getis A (eds) Handbook of applied spatial analysis. Software, methods and applications. Springer, Berlin/Heidelberg/New York, pp 255–278 Green M, Flowerdew R (1996) New evidence on the modifiable areal unit problem. In: Longley P, Batty M (eds) Spatial analysis: modelling in a GIS environment. GeoInformation International, Cambridge, UK, pp 41–54 Heasman MA, Kemp W, Maclaren AM, Trotter P, Gillis CR, Hole DJ (1984) Incidence of leukaemia in young persons in west of Scotland. Lancet 323(8387):1188–1189 Jones K, Manley D, Johnston R, Owen D (2018) Modelling residential segregation as unevenness and clustering: A multilevel modelling approach incorporating spatial dependence and tackling the MAUP. Environment and Planning B: Urban Analytics and City Science, 45(6):1122–1141. Kirby AM, Taylor PJ (1976) A geographical analysis of voting patterns in the EEC referendum. Reg Stud 10(2):183–191

Scale, Aggregation, and the Modifiable Areal Unit Problem

1725

Manley D, Flowerdew R, Steel D (2006) Scales, levels and processes: studying spatial patterns of British census variables computers. Environ Urban Syst 30(1):143–160 Martin D (2003) Developing the automated zoning procedure to reconcile incompatible zoning systems. Int J Geogr Inform Sci 17(1):181–196 Odoi A, Martin SW, Michel P, Holt J, Middleton D, Wilson J (2003) Geographical and temporal distribution of human giardiasis in Ontario, Canada. Int J Health Geogr 2(1):5 Openshaw S (1977) A geographical solution to the scale and aggregation problem in regionbuilding, partitioning and spatial modeling. Trans Inst Br Geographr New Ser 2(4):459–472 Openshaw S (1978) An empirical study of some zone-design criteria. Environ Plann A 10 (7):781–794 Openshaw S (1984) The modifiable areal unit problem. CATMOG 38. GeoBooks, Norwich Openshaw S, Rao L (1995) Algorithms for reengineering 1991 census geography. Environ Plann A 27(3):425–446 Openshaw S, Taylor PJ (1979) A million or so correlation coefficients, three experiments on the modifiable areal unit problem. In: Wrigley N (ed) Statistical applications in the spatial sciences. Pion, London, pp 127–144 Openshaw S, Taylor PJ (1981) The modifiable areal unit problem. In: Bennet RJ, Wrigley N (eds) Quantitative geography. Routledge Kegan Paul, Henley-on-Thames, pp 60–69 Openshaw S, Alvanides S, Whalley S (1998) Some further experiments with designing output areas for the 2001 UK census. In: The paper presented at the 4th of the ESRC/JISC supported workshops planning for the 2001 census Poulsen M, Johnston R, Forrest J (2011) Using local statistics and neighbourhood classifications to portray ethnic residential segregation: a London example. Environ Plann B 38(4):636–658 Tobler W (1970) A computer movie simulating urban growth in the Detroit region. Econ Geogr 46 (2):234–240 Wilkie D (1986) Precognition on: review of the Scottish Health Service ISD report on Geographical distribution of leukaemia in young persons in Scotland 1968–1983 (document D/P/20). Presented at EDRP public local inquiry, Thurso, September, 1986 Wong D (2003) Spatial decomposition of segregation indices: a framework toward measuring segregation at multiple levels. Geograph Anal 35(3):179–194 Wong D (2009) Chapter 7. The modifiable areal unit problem (MAUP). In: Fotheringham AS (ed) The SAGE handbook of spatial analysis. Springer, Dordrecht, pp 95–112

Spatial Network Analysis Clio Andris and David O’Sullivan

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Spatial Networks and Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Vertex Degree, Graph Density, and Local Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Spatial Embedding and Planarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Shortest Paths, Distances, and Network Efficiencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Acyclic Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Higher Order Structure in Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Network Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Community detection, Modules, and Subgraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Structural Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Generating Networks: Spatial Network Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Spatial Networks from Point Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Spatial Small Worlds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Growing Spatial Networks: Preferential Attachment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Dual Graphs: New Graphs from Old . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Matrix and Adjacency List Representation of Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Properties of Real-World Spatial Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Road Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Transport Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Other Spatially Embedded Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Network Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1728 1730 1731 1732 1733 1735 1737 1737 1738 1738 1740 1741 1741 1743 1744 1745 1745 1746 1747 1747 1748 1748

C. Andris (*) Department of Geography, The Pennsylvania State University, University Park, PA, USA e-mail: [email protected] D. O’Sullivan School of Geography, Environment and Earth Science, Victoria University of Wellington, Wellington, New Zealand e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_67

1727

1728

C. Andris and D. O’Sullivan

7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1749 8 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1749 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1749

Abstract

Spatial networks organize and structure human social, economic, and cultural systems. The analysis of network structure is rooted in mathematical graph theory, and spatial networks are a special type of graphs that are embedded on the earth’s surface. Thus, their analysis necessitates the fusion of graph theoretical and geographic concepts. Key concepts and definitions from graph theory are reviewed and used to develop a variety of graph structural measures, which can be used to investigate local and global network structure. Particular emphasis is placed on three major concepts: high-level network structural features of centrality, cohesive subgraphs, and structural equivalence. With these metrics in mind, we describe considerations for their use within a spatial context. Pointers to empirical research on real-world spatial networks are provided. Keywords

Graph theory · Networks · Flow data · Spatial interaction · Connectivity

1

Introduction

It has become commonplace to think of ourselves as inhabitants of a “networked world.” We travel in a network of infrastructure such as roads, rail, and airline routes. When we make calls or send SMS messages, we send information over wired and wireless networks designed to reach distributed locations almost instantaneously. The commodities that surround us are distributed over complex logistical and transportation networks, often after undergoing even more intricate journeys through the global supply chains that underpin just-in-time manufacturing. Perhaps the most significant contemporary manifestation of the spatial network is the Internet, augmented in recent years by web 2.0 technologies that enable online social networks, and by mobile technologies which maintain those connections even while people move through global transport networks from city to city and continent to continent. If “[t]he most profound technologies are those that disappear” (Weiser 1991, p. 94), then the Internet is by any measure profound, so much so that we only notice it – it only becomes visible – when it is unavailable. Of course, most networks are much older and more obviously geographical than the Internet. Geographers and other spatial scientists thus require network models to depict the effects of flows in and out of places, their magnitudes, and their paths within a global system. Arguably, when it comes to understanding the aggregate geographies of the human world, whether from a social, economic, or cultural perspective, it is networks which structure, constitute, and organize those patterns. Manuel Castells (1996) foresaw (but only just!) this development in his The Rise of the Network

Spatial Network Analysis

1729

Society. Castells suggests that the network society alters social, economic, and cultural relationships, creating a global “space of flows” not directly associated with any particular location on the Earth’s surface. Less radically, other scholars have argued that a key determinant of the relative importance of world cities is not their geographical location per se but their relative location or position in economic, transport, social, and cultural networks. For example, Taylor et al. (2012), using network measures, rank the relative importance of cities to argue that London is an “alpha++” city out-ranking many more populous cities such as Tokyo (alpha+), Seoul (alpha), or Los Angeles (alpha). Similarly, a study of the global flight network suggests that under certain definitions of “centrality,” Paris, London, Singapore, and, unexpectedly, Anchorage were the most central cities in terms of worldwide flights (Guimerà et al. 2005), although these are not the busiest airports in the world. What makes these airports rank above others is not their particular individual characteristics or geographical location but their position in relation to other airports via a network of scheduled flights. We define a spatial network as one whose nodes are spatially embedded (Fig. 1). Real-world regional science problems deal with spatial networks whose nodes, and often edges, are geolocated on the earth’s surface, i.e., with geographic coordinates. These include significant infrastructure from transport systems and telecommunications to the supply of electricity and water. Geographic networks exist at various scales from networks of continents to relations among a set of individuals in a small village. While there is no specific size range, interacting entities in spatial networks most often used by regional scientists tend to be at minimum, human (or smaller species)-size, and at maximum, smaller than the Earth. A subset of geographic networks is topologically embedded, i.e., fused with other spatial data layers in a GISystem. Regional scientists have most commonly used spatial network (or spatial interaction) data without necessarily considering what surrounds the network in Cartesian space – an approach that both suits the field’s major research questions and its slant toward spatial economic or econometric modeling approaches. However, we are getting ahead of ourselves. Whether or not we consider a network analysis of world cities (or anything else) to be informative, before we Fig. 1 Nested levels of networks. (Source: Created by authors)

1730

C. Andris and D. O’Sullivan

can deploy such methods, we must define terms, delineate scales, and develop measures. As in any field of quantitative study, we need measures to enable repeatable descriptions of the objects of study and models to allow us to determine if the measurements we make of empirical cases are interesting. In the next section, basic concepts, definitions, and measures from graph theory are introduced. There follows a consideration of higher level concepts of graph structure and associated measures. Throughout these sections, pertinent aspects of spatial networks are discussed. Following this, we introduce some models for spatial networks and comment on their properties. We then consider some significant findings from the rapidly growing literature applying these methods to real spatial networks. The chapter ends with some pointers to possible future directions Note that we do not consider here the numerous problems in computer science, operations research, and transport analysis (particularly traffic assignment and related problems) which are closely associated with the analysis of spatial networks. As testament to the wide-ranging interpretation of spatial networks, interested readers can find monographs entitled Spatial Network Data devoted to road network routing (Oliver 2016) or the major review paper “Spatial Networks” that finds commonalities in the structure of urban highways and biological formation of mold (Barthélemy 2011).

2

Spatial Networks and Graphs

In their stimulating and still relevant text Network Analysis in Geography, Haggett and Chorley (1969) follow Kansky (1963) in moving quickly from considering spatial networks to the analysis of mathematical graphs. Real spatial networks are complicated physical entities, with numerous elements, themselves often complex entities, such as multilane highways, or airports with several runways (see Fischer 2004, for how these complications may be handled in a GIS setting). Our primary interest in the analysis of spatial networks lies in understanding how the network structures connectivity centrality so as to centralize some locations, marginalize others, and, in general, differently position locations with respect to one another. We may also be interested in how network connectivity alters the characteristics of origin and destination locations, and how networks may connect two distant places while potentially impeding interactions between geographically proximal places. In the first instance, it makes sense to strip away the messy complication of real spatial networks and work with the simpler, abstract representation of a mathematical graph. We therefore begin with definitions from graph theory, which provides a foundation for the analysis of networks. A graph is a mathematical abstraction which can represent any set of elements somehow related to one another. Wilson (1996) provides a succinct introduction to the key terms and concepts discussed below. More advanced references delve into this field of discrete mathematics (Gross and Yellen 2006), which is fundamental to computer science, and is increasingly considered fundamental across all the sciences (Newman 2010).

Spatial Network Analysis

2.1

1731

Basic Definitions

A graph G consists of a finite, non-empty set V = {vi} of vertices and a finite non-empty set E = {ei} of distinct, unordered pairs of distinct elements in V, called edges. The number of elements in V, commonly denoted n, is the degree of G. The number of edges in E is often denoted m. Figure 2 shows a typical small graph with V = {a, b, c, d, e, f, g, h}, E = {ab, bc, cd, cf, de, dg, dh, eg, fg, gh}, n = 8, and m = 10. The edge vivj or eij is said to join (more commonly link or connect) the vertices vi and vj, and these vertices are considered adjacent. We say that eij is incident with vi and vj, and that vi is a neighbor of vj. The neighborhood N(vi) of vi, often written simply Ni, is the set of vertices adjacent to vi. Two edges incident with the same vertex are adjacent edges. In Fig. 2, Nb = {a, c}, and edges ab and bc are adjacent. Given the ubiquity of graphs (or networks), it should come as no surprise that there is considerable confusion around terminology, with different fields adopting different terms in various contexts. Vertices are often referred to as nodes and represent the entities in a network, such as cities, people, cell phone towers, airports, railroad stations, and so on. Edges are commonly referred to as links, flows, or connections and represent relationships between nodes, such as movements of goods, services, or people, the existence of airline routes, mutual intervisibility, and so on. We can think of graphs as mathematical abstractions of networks which exist in the real world, in much the same way that variables represent measurements of real phenomena – this is the distinction between vertices and nodes, edges and links, and so on. In this section, while introducing formal definitions from graph theory, we adopt the proper mathematical terms but elsewhere may return to widely used synonyms (such as network, node, link, etc.). The structure G = (V, E) described so far is a simple graph, which has limited relevance to the representation of complicated real-world networks. In a simple graph, edges are denoted as simply present or absent. We may also want to include cases where vertices may be joined to themselves by a loop or self loop vivi, and multiple edges may also be allowed if we drop the requirement that edges be distinct, resulting in a multiplex network or multigraph. More significantly, directed graphs (sometimes referred to as digraphs) consist of a set of vertices V and a set of arcs A or directed edges each of which consists of an ordered pair of vertices in V, implying directionality in the relationship between the vertices. This Fig. 2 A typical graph. (Source: Created by authors using synthetic data)

d b

a

h e

c g f

1732

C. Andris and D. O’Sullivan

departure from the simple graph allows us to consider relationships where flows in each direction may be different (or even nonexistent in one direction) and is obviously an important consideration in many real-world infrastructure or distribution networks. Another variant on the simple graph is the weighted graph where each edge has an associated value or weight often denoted wij, representing some attribute of the relationship between the vertices it joins. Two different types of edge weights are commonly used, yet conceptually orthogonal: a metric defining edge traversal cost and a metric defining edge strength. The most obvious attribute of interest in many geographical applications is the length of the edge, measured either as Euclidean distance, travel distance, or some other cost factor such as travel time or monetary cost. More generally, edge weights may represent some cost associated with movement along the edge. Less obviously, but equally applicable, are edge weights that represent the strength of the relationship between the vertices they join. The volume or value of trade between two countries or the number of daily flights between two airports are just two examples among many possibilities. In many cases, weights relating to the strength of a relationship between the incident vertices will reflect rates of flow or the capacity of the associated edges (Bergmann and O’Sullivan 2018).

2.2

Vertex Degree, Graph Density, and Local Clustering

Even the limited graph theoretic concepts introduced so far allow us to develop useful descriptive measures of graph structure. Most obviously, the number of edges incident with a vertex (alternatively, the number of vertices in its neighborhood) is its degree denoted deg(vi) or ki. The average vertex degree is a useful summary measure of graph structure, given by k ¼ 2m=n

(1)

since each edge is incident with two vertices. This measure is equivalent to Kansky’s β index (1963) differing only by a constant multiplier. The degree list of a graph is the set of vertex degrees often arranged in order of increasing degree. For the graph in Fig. 2, the degree list is {1, 2, 2, 2, 2, 3, 4, 4}. In large graphs representing complex real-world networks, it is more useful to examine the degree distribution of the vertices. The degree distribution resembles a probability density function that describes the relative probability of encountering a node of given degree k. As we shall see, the degree distributions of many spatial networks are strongly constrained by their spatial embedding. If all vertices have the same degree k, it is regular of degree k, or k-regular, producing a uniform degree distribution. In practice, this is unlikely to occur in spatial networks but may provide a useful benchmark or null model for assessment of an observed network’s structural regularity. For a simple   graph with n vertices, the maximum number of edges that could exist is given by n2 ¼ nðn  1Þ=2. Comparing the actual number of edges in the graph to

Spatial Network Analysis

1733

this maximum provides a measure of how strongly connected the graph is overall,   namely its density, ρ ¼ m= n2 ¼ 2m=nðn  1Þ. A graph’s density is the fraction of all possible edges which could exist relative to those which actually do exist. The undirected graph in Fig. 2 has density 2  10/(8  7) = 0.357. Directed graphs will have twice as many edges-one for each direction between the nodes. Because the number of possible edges in a graph grows approximately with the square of its degree, whereas in most spatial networks the number of edges grows roughly linearly with graph degree, most spatial networks have low density, and only a small proportion of all the possible connections exist. This generally arises either because of distance decay effects, where direct connections over great distances become sparse due to low demand, or due to planarity constraints, for example, a train station can only accommodate so many sets of physical tracks in its circumference. We consider both issues in more detail below. Because of the constraints on overall graph density in spatial networks, it is often more interesting to consider the local density or clustering of a spatial network. This is a measure of how strongly connected the graph is in the neighborhood of each vertex. The clustering coefficient of a particular vertex: C ð vi Þ ¼

2mi k i ðk i  1Þ

(2)

where mi is the number of edges joining vertices in the neighborhood of vi. This is a direct localized equivalent to graph density and provides information about how well connected the network is locally. The distribution of the clustering coefficient in a network provides useful information about its structure. One interpretation is that it gives the probability, given that two vertices vj and vk are neighbors of vi that vj and vk are themselves neighbors. Many spatial networks exhibit high clustering coefficients compared to nonspatial networks, which is unsurprising: if two vertices in a spatial network are neighbors, it implies that they are near one another, and if two vertices share a common neighbor, since they are probably also near one another, there is a high chance that they will be neighbors of one another.

2.3

Spatial Embedding and Planarity

Thus far, there has been no explicit consideration of the spatial or geographic aspect. Where we are concerned with spatial networks, vertices will have an associated spatial entity, often conveniently considered to be a point location, but potentially also a more complex spatial entity such as a region – for example, in a trade network, vertices may represent regions or countries. Two types of spatial embedding of a graph are possible. The most obvious spatial networks are those where both vertices and edges are spatially embedded. Examples include transport and infrastructure networks, where the graph edges are physically realized in space, with direct implications for any associated weights or directional restrictions. Less obviously, spatial embedding of edges imposes a constraint on the

1734

C. Andris and D. O’Sullivan

overall network structure, that of planarity. A planar graph is one which can be drawn in two dimensions with no edges intersecting except at vertices on which they are both incident. For many infrastructure networks, this is approximately true, although bridges and tunnels in ground-transport networks are an obvious (but generally minor) exception (see Table 1). The planarity constraint significantly alters the overall structure of graphs, and we consider its implications below. A second form of spatial embedding is where vertices have associated spatial locations, but edges represent nonspatial relationships. An example is a spatially embedded social network. Individuals in the network have some spatial location – perhaps their home address, zip code associated with credit card, or migration trajectory, and edges might represent friendship or acquaintanceship relationships. Such networks rarely constitute the primary object of analysis, although they may easily arise as dual graphs in some analyses. In considering such a network to be a spatial network, we implicitly assume that the distance between vertices (whether direct Euclidean distance or over intervening spatial networks – see below) has an effect on the probability of their existence. In other words, we expect that vertices more remote from one another are less likely to be joined than those that are closer together. Where the distinction matters, we will refer below to fully embedded or vertex embedded spatial networks, reserving the term spatial networks to refer to networks of either kind. A fundamental difference between spatial networks with spatially embedded edges, which are (approximately) planar, and spatial networks not affected by this constraint lies in the constraints it places on the overall density of the graph both globally and locally. A key result is Euler’s formula for planar graphs: nmþf ¼2

(3)

where n and m are the number of vertices and edges as before, and f is the number of faces or regions in the plane which the graph divides the space into. We consider the overall region in which the graph is embedded as face, so that for the graph in Fig. 2, f = 4, that is the whole space and the regions cdgf, deg, and dhge. Euler’s result is Table 1 Types of real-world spatial networksa Example network type Type 1: Stream Type 2: Power grid Type 3: Roads/rail Type 4: Freight Type 5: Travel Type 6: Telecommunications Type 7: Social networks a

Flow entity Nature/materials Energy/utilities Transportation Goods Movement activity Information transfer Relationship evidence

Planarityb Planar Planar Planar Both Both Nonplanar Nonplanar

Table based on Andris et al. (2018) Planarity is not guaranteed for all forms of transport, particularly by air or by sea, although it is approximately true for road and rail

b

Spatial Network Analysis

1735

Fig. 3 How a planar graph grows as edges are added. Either the number of faces f (upper path) or the number of vertices n (lower path) must increase but not both. (Source: Created by authors using synthetic data)

n n m m+1 f f+1

n n m m+1 f f+1

easily proved when we consider starting from a graph consisting of one vertex and no edges, so that n = 1, m = 0, and f = 1 when Eq. (3) clearly holds (see Fig. 3). Adding any edge while maintaining planarity, either (i) joins two existing vertices without intersecting an existing edge, so increasing both m and f by one, while leaving n unchanged or (ii) adds a new vertex and joins it with an edge, increasing both n and m by one with no change in f. In either case, Eq. (3) remains true. Euler’s formula has important implications for the possible density of planar graphs. Since every face requires at least three edges, and each face can share an edge with at most one other face, we know that m  3f/2. Combining this result with Eq. (3), we arrive at: m  3n  6:

(4)

Combining this result with Eq. (1) tells us that the upper bound on the mean degree of a planar graph is k  6 . Kansky (1963, p. 18) recognizes this in providing alternative formulations of his γ index (equivalent to graph density) for planar and nonplanar graphs. Understanding this result, it is much easier to understand why area maps are so distinctively structured, and why the Voronoi tessellation and associated Delaunay triangulation exhibit such characteristic structure and also tend to produce relatively uniform degree distributions that are rarely seen in biological or social networks. Since the spatial network constructed from the adjacency relations of a set of polygonal regions must necessarily be planar, the mean number of neighbors of each region cannot exceed six. In graph terms, this bound on the number of edges in a planar graph and thus on many spatial networks, means that almost all spatially embedded networks are sparse, with m  n2 and ρ/1/n as n ! 1. In terms of local clustering, planarity also implies that any vertex with ki > 4 must have Ci < 1 since it is impossible for any graph of more than four vertices to be fully connected.

2.4

Shortest Paths, Distances, and Network Efficiencies

A particular sequence of edges {v0v1, v1v2, . . . , vr – 1 vr} forms a walk of length r. If the vertices in a walk are distinct, then it is a path, and a path that begins and ends at

1736

C. Andris and D. O’Sullivan

the same vertex forms a cycle. The distance between vertices vi and vj, dij is the length of the shortest path among the set of all possible paths between vi and vj. Any one path between vi and vj of length dij is a geodesic. The largest distance between any two vertices in a graph is its diameter. In a directed graph, walks may only proceed in the direction of constituent arcs. In a weighted digraph, the length of a path is generalized from the above definitions by summing the weight of its constituent edges, and the distance between two vertices is the length of the shortest path, as before. Given that the weights associated with the arcs in each direction between any two vertices are not necessarily the same (that is, wij 6¼ wji), there is no guarantee that the distances between vertices in a directed graph will be symmetric. Note also that where graph weights do not represent a “traversal cost,” such as when they represent link capacities or trade volumes, number of people or other similar measures, then it does not make sense to accumulate edge weights in this way. The (graph) distances between any two nodes or between all pairs of nodes in a spatial network are of considerable interest, particularly in how they compare to the corresponding straight line (Euclidean) distances between the corresponding node locations. If a network provides a path between two nodes whose distance is close to the straight line distance, then the network is efficient for that particular journey. On the other hand, if the network involves a very much longer and circuitous path between two locations, it is inefficient. A measure of the network efficiency for a single particular path is the route factor defined by Black (see 2003) as:   d G vi , vj  Qij ¼  d E vi , vj

(5)

where dG and dE are graph-based and Euclidean distances, respectively between locations i and j. We can average this quantity over a particular node: 1X Qij n j

(6)

X 1 Q j: nðn  1Þ j6¼i i

(7)

Qi ¼ or over the whole network: ¼ Q

Traditionally, the cost of traversing a network has been calculated by adding the cost (duration, length, monetary cost using gas and tolls, etc.) incurred along each edge traversed between the origin and destination. Shortest-path algorithms, including the classic Dijkstra’s algorithm (Dijkstra 1959), iterate over multiple paths until the most cost-effective route is determined. Solutions to problems like the traveling salesman problem (TSP) are evaluated according to the time a computer takes to solve them, and how this convergence time increases with more nodes and edges to

Spatial Network Analysis

1737

Fig. 4 The minimum spanning tree of a set of points. (Source: Created by authors using synthetic data)

consider. In operations and logistics, optimal facility placement problems rely on the structure of the surrounding network. The network is used to measure the relative accessibility from all network to all other locations and the service areas that proposed facilities create. The state-of-the art in planar network optimization tasks includes k-means methods for event clustering and discovery on networks as well as Monte Carlo methods that help sidestep the costly calculation of each possible network path by prioritizing evaluation of options more likely to be fruitful (Oliver 2016).

2.5

Acyclic Graphs

A tree is a special type of graph which includes no cycles, in which n = m + 1. The minimum spanning tree of a set of vertices is the tree which minimizes the total weight of the edges in the tree, and when the weights relate to the cost of providing the associated node-to-node links, it represents the cheapest way to connect every node to every other. However, such a network is unlikely to be efficient from the point of view of a user of the network as measured using the route factor, since it will certainly involve many very circuitous shortest paths (see Fig. 4). Such a network is also vulnerable to failure, since losing just one edge will leave it disconnected. In practice, real networks almost always have more edges than the minimum spanning tree.

3

Higher Order Structure in Networks

The definitions and measures we have considered so far focus on the overall structure of a network in a general way or on the structure at a particular location. Summary measures or distributional properties of these measures are useful, but they

1738

C. Andris and D. O’Sullivan

often fail to reveal structural aspects of networks which arise out of the totality of all the spatial relationships in the network. In this section, we briefly consider measures of such higher order structure.

3.1

Network Centrality

Consideration of distance in networks leads naturally to questions of the most accessible or central locations in the network. An obvious approach is to calculate the mean distance from a node to every other node in the network: cc ¼

1 X d ij n  1 j6¼i

(8)

where dij is the graph distance between vertices vi and vj as previously defined. Using this closeness centrality measure, the most central node in a network is that with the minimum cc. While this is an obvious measure of network centrality, there are many alternatives. One which has received considerable attention in recent years, because of its close relationship to movement on the network and to how subregions of the graph are connected to one another (see below), is betweenness centrality. The betweenness centrality of a vertex vi is the proportion of the shortest paths between all other pairs of vertices vj 6¼ vk in which vi appears. If gjk(vi) is the number of shortest paths from vj to vk in which vi appears, and gjk the total number of shortest paths from vj to vk, then: cb ¼

X gjk ðvi Þ : gjk j6¼k

(9)

This measure has the nice property that it can be readily extended to edges also, simply being the proportion of all shortest paths on which each edge lies. Betweenness centrality provides an indication of the extent to which each vertex or edge has the potential to control movement or communication in the graph, assuming that there is “everywhere to everywhere” movement in the system. This measure is directly related to approaches that rely on random walk models. The most central vertices and edges measured in this way will experience the most traffic when a population of random walkers moves around the system. Another metric of centrality, eigenvector centrality, privileges nodes that are connected to nodes with high centrality; a classic example is Google’s famous PageRank algorithm that gives a high score to websites that popular website link to.

3.2

Community detection, Modules, and Subgraphs

A class of measures which remains difficult to define precisely, but which has a clear intuitive interpretation has recently come to the fore, as researchers attempt to

Spatial Network Analysis

1739

determine how a network can be broken into cohesive subgraphs or regions, now most often referred to as communities. Fortunato and Hric (2016) provide a comprehensive overview of this literature. The general definition of a community is that the member vertices of a community are more strongly connected to one another within the community than to vertices outside the community. This definition is not very precise, however. To take it to the extreme, we could argue that any joined pair of vertices is more closely connected to one another than they are, on average, to the rest of the graph (unless the graph is fully connected). A less extreme definition is to consider communities as small fully connected subgraphs or cliques (from their origins in social network analysis, see Wasserman and Faust 1994), but due to spatial constraints, cliques are unlikely in fully embedded spatial networks and so unlikely to be useful. We obviously need a more flexible definition. Many have been suggested in the social networks literature (see Wasserman and Faust 1994, pp. 257–267), but most suffer from serious computational challenges in identifying them in graphs of any size, because of the exponential growth in the number of subsets of the vertex set V as graphs get larger. This situation means we must attend closely to the details of community detection algorithms when considering research in this area. Community detection methods can be roughly categorized into divisive (starting with a whole graph and removing weak edges) and agglomerative (starting with only nodes and adding key edges), although some methods use both processes. Either approach can be used to create network subgraph hierarchies. An important breakthrough has been in the development of more computationally tractable methods applicable to large graphs, beginning with the Girvan-Newman algorithm (Girvan and Newman 2002). Many of these methods are based on heuristic approaches, which successively remove edges of low betweenness centrality while repeatedly recalculating some measure of the quality of the resulting graph decomposition. Other methods aim to identify hierarchically nested communities and can consequently deal with very large networks with millions of nodes. It is important to note that finding optimal subgraphs requires a certain amount of iteration and often considerable interpretative effort from the analyst (see Dash Nelson and Rae 2016). None of these methods are explicitly spatial, although when edges are dependent on spatial proximity; this should not be a cause for concern. Some community detection methods ensure that resultant communities are spatially contiguous. Other methods partition network nodes based solely on their network connectivity, and when nodes are geolocated, these communities can be geographically discontinuous. This topic is becoming highly relevant to regional science, as regions can be built up using networks of travel, migration, and social interaction such as phone calls (Sobolevsky et al. 2013). Such “bottom-up” regions can then be compared to boundaries of administrative districts to investigate how well a society’s network is bounded by how government administration is organized geographically. The net result of these considerations is that the current working definition of a graph community is a circular one: graph communities are those subgraphs in a graph identified by a community-detection algorithm. This places considerable

1740

a

C. Andris and D. O’Sullivan

b

c

Fig. 5 Centrality and network communities illustrated. See text for details. (Source: Created by authors using synthetic data)

importance on the analyst’s ability to meaningfully interpret any communities so far identified, a situation analogous to the cluster analysis of multivariate statistical data. Vertex centrality and community structure for a typical spatial network are illustrated in Fig. 5. In Fig. 5a vertex centrality calculated from the total path length from each vertex to every other while considering the Euclidean length of the graph edges is shown, with the darkest shaded vertices the most central. As is often the case, the most central vertices are those that are most geographically central to the network, as we might expect. By contrast, in Fig. 5b the betweenness centrality based only on the network topology is shown. Because, in topological terms, the vertices to the west of the central part of this network provide a shortcut around the densely packed central region, these vertices are highlighted by this centrality measure. Finally, Fig. 5c shows a possible community structure in this network where seven distinct regions in the network have been identified based on their mutual connectivity.

3.3

Structural Equivalence

A final graph structural characteristic likely to be of interest is the concept of structural equivalence among vertices or edges. This concept is relatively easily grasped, but precise definitions are mathematically challenging, and detection of structurally equivalent sets of vertices remains difficult. The idea is that vertices that are structurally equivalent have similar relationships to the rest of the graph as one another, an idea whose origins lie in social network analysis where structural equivalence is related to social roles (Lorrain and White 1971). Graph communities are a special case of structurally equivalent vertices, which share the property of being in the same community. In a transport network, we might expect major junctions on arterial routes to constitute an equivalence class. However, pinning

Spatial Network Analysis

1741

down how this concept can be realized in practice has proved difficult, and detection of structurally equivalent (or more usefully structurally similar) sets of vertices is computationally challenging – consider that while community detection can assume that the subgraphs of interest are connected subsets of the graph, no such assumption can be made for structural equivalence classes. While the concept of structural equivalence is attractive for the analysis of spatial networks, progress in this area remains rather limited. Recent developments in graph block modeling show promise (Bergmann and O’Sullivan 2018) but remain challenging to apply and interpret.

4

Generating Networks: Spatial Network Models

While measures of network structure are important tools in improving our understanding of spatial networks, it is equally important to develop models for network formation. This was recognized by Haggett and Chorley in their coverage of [network] “Growth and Transformation” (1969, pp. 261–318), an extensive chapter and a very modern treatment. A recent review paper (Barthélemy 2011) provides a useful overview of many different spatial network models. These models are especially important because they act as “null hypotheses” with which to compare a real-world network, in order to determine whether the real-world network has significant characteristics that deviate from expectations. Here we briefly review some of the available models and consider their general properties. An important null model for any network is the Erdös-Rényi (E-R) model (see Erdös and Rényi 1960), which has been much studied. The E-R graph is generated as follows: create a set of n vertices, then consider every possible pair of vertices and with probability p join them with an edge. In this random network, the overall degree distribution P(k) is well-approximated by a Poisson process. Of particular interest are the expected mean clustering coefficient and path length of vertices in the graph, once the network is sufficiently dense to be connected with no isolated clusters, an event which happens quite suddenly close to p = ln n/n. The expected clustering coefficient hCi is given by p since p is the probability that any two vertices will be connected and also is the likely proportion that the neighbors of any vertex will be connected. The expected  The E-R shortest path length hdi in the E-R graph is approximated by n=ln k. configuration model is a popular null model that creates different iterations of a random graph while maintaining a certain degree distribution. For instance, in a social network, this reconfiguration allows for the same number of popular and unpopular nodes but reshuffles with whom they are connected.

4.1

Spatial Networks from Point Patterns

Perhaps the most obvious way to generate a spatial network is to begin with a point pattern and then to apply some geometric rules by which points are connected to one

1742

C. Andris and D. O’Sullivan

another (or not). This geometric graph model admits considerable variety in the outcomes depending on both the underlying point pattern and on the geometric rules applied. It also has the property that if we make the “rule” for joining nodes independent of the distance between them, then it is equivalent to the E-R random graph. More reasonable rules will be familiar from the construction of spatial weight matrices (see Fischer and Wang 2011). A distance criterion is where two nodes are joined if they are closer together than some threshold distance. In Fig. 6a nodes nearer than five units apart are connected. A nearest neighbor criterion is where each node is joined to its k nearest neighbors. In Fig. 6b each node is joined to its four nearest neighbors. An attribute-distance rule is where depending on some attribute of the nodes and their separation distance they are joined or not. The simplest form of this rule is where the attribute is a radius of influence ri, and nodes i and j are joined if ri + rj < R

a

b

c

d

e

f

Fig. 6 Examples of geometric networks as described in the text. The region is 40 units east-west and 60 north-south, and the point pattern is inhomogeneous Poisson with greater intensity at the center of the region. Point symbols in (c) are scaled so that if the circles of two points intersect they are joined in the network. (Source: Created by authors using synthetic data)

Spatial Network Analysis

1743

where R is a threshold distance. An example is shown in Fig. 6c. Such a simple rule might be meaningful in the context of trees in a forest influencing one another, but in regional science, a more likely formulation will be based on an interaction measure such as mi mj d a ij where the m values represent activity or population at each location and α is a constant controlling the rate at which likely connection falls away with distance. A purely geometric rule is such as those governing the Delaunay triangulation or closely related Gabriel graphs (see Okabe et al. 2000) as shown in Figs. 6d and 6e, respectively. A global rule is such as that governing construction of the minimum spanning tree, where the edges are those which together connect the network with the minimum total path length. An example is shown in Fig. 6f. As is clear from Fig. 6, the various geometric rules produce quite different networks. An important distinction is that the Delaunay, Gabriel, and minimum spanning trees are planar, whereas the other networks are not. At the same time, that they do not guarantee planarity, the distance and distance-attribute models may also leave some nodes unconnected. It is unusual for real-world spatial networks to leave isolated regions, and so it may be that hybrid models where a distance criterion is applied subject to a connectivity and/or planarity requirement are more reasonable in some cases. One approach is to start with a network substrate such as the minimum spanning tree or Gabriel graph and add additional edges according to a distance criterion.

4.2

Spatial Small Worlds

Small-world networks are so-called after the commonly encountered apparent contradiction in social networks that while they are locally highly connected (that is, they have high clustering coefficients), they also have globally short paths (that is, the mean path length is low). It is apparent that this result does not hold for graphs generated by the E-R model. As noted, E-R graphs become connected when p = ln n/n and k ’ ln n. For a network with 1000 nodes, this gives us hCi = p = 0.00691, k ’ 6:91 and hdi ’ 3.57. Increasing n to 106 reduces hCi to 1.38  105 while hdi only increases to 5.26. Clearly, although E-R networks are small worlds with short path lengths, they do not have the high local connectivity that makes this property in social networks surprising. Watts and Strogatz (1998) presented an alternative network model that is both highly clustered locally, yet has short mean path lengths. Their approach is to start with a regular lattice, and “rewire” it by breaking links and reconnecting them to other nodes selected at random from anywhere in the network. They show that only small numbers of rewiring events are necessary to dramatically reduce the mean path length in a lattice. Although Watts and Strogatz present their work for one-dimensional lattices, the basic idea is readily extended to more realistic spatial settings.

1744

C. Andris and D. O’Sullivan

In two dimensions, a regular lattice is a grid of nodes with each node connected to its four nearest neighbors. The expected path length between any two nodes selected at random scales with n1/2, and, in general in a D-dimensional lattice, path lengths will scale with n1/D. Spatial small-world models, rather than rewiring the lattice, typically introduce additional “shortcut” links with the probability of the shortcuts dependent on the distance between the vertices they join. The probability might be proportional to d δ where δ is a parameter chosen in a particular case. As δ is ij increased while holding constant the overall number of additional links added, the networks produced by models of this kind transition from random to small-world to regular lattice properties. This is readily understood in qualitative terms. For low values of δ, any length shortcut is equally likely – in effect, nodes are undifferentiated from one another – and the network has distance properties similar to a random network. High values of δ heavily penalize the provision of longer shortcuts, leaving the lattice’s overall n1/D distance-scaling property intact. Many transportation networks lie somewhere on this continuum, depending on how we incorporate shortcuts such as urban orbital highways, high-speed rail links and airline routes into the more densely connected local transport network.

4.3

Growing Spatial Networks: Preferential Attachment

The examples above apply a connection or rewiring rule to a preexisting set of nodes. Arguably, a more realistic approach is to grow a spatial network from an initial individual node, by progressive addition of new nodes and edges, according to some rules governing how new nodes are attached to existing ones. Once again, the baseline case is a nonspatial network growth model known as the preferential attachment model, attributable to Albert et al. (1999), which has spawned a large literature on “scale-free” networks. The basic idea is that nodes are added to a network and attach themselves preferentially to those nodes that already have larger numbers of connections. The resulting networks have heavy-tailed distributions of vertex degrees meaning that a small number of very strongly connected nodes dominate the network structure. Planar networks clearly cannot exhibit such characteristics, and physical constraints in most spatial networks prevent an unrestricted preference for attachment to the most well-connected existing nodes. Preferential attachment models which consider space require each new node to have a spatial location and the probability of attachment to existing nodes is then a function of both the degree of existing nodes and the distance between the new node and the existing nodes; this is similar to the attribute-distance geometric models considered previously but with progressive addition of nodes rather than an all-at-once calculation. Models of this general structure often produce the hub-and-spoke structures characteristic of many distribution networks. Again, as with spatial small-world networks, the rate at which the probability of a connection decays with distance is important in determining the overall characteristics of the resulting networks.

Spatial Network Analysis

4.4

1745

Dual Graphs: New Graphs from Old

An important idea for models of networks is the dual transformation, whereby an initial graph is transformed to a new graph by switching between nodes and edges, or (in a planar graph) between faces and nodes. The line graph L(G) of G is the graph whose vertices correspond to the edges of G and where two vertices are joined when their corresponding edges are adjacent. This dual transformation is shown in Fig. 7 and as in the case illustrated results in a denser graph with more variety in the vertex degree distribution than the original “primal” graph. The line graph dual transformation is often applied to the more obvious primal network representation of a system, such as the road intersection and segment network, because the richer structure provides more opportunities for insight into key features of the network. Figure 7(a)–(d) shows a simple example. In a planar graph, a similar dual transformation entails treating each face of the graph as vertex in a new graph and joining those vertices whose faces are adjacent in the original graph. This is the relationship between the familiar Voronoi tessellations and the Delaunay triangulation (see Okabe et al. 2000).

5

Matrix and Adjacency List Representation of Graphs

How do researchers encounter network data? There are two common formats: full matrices and edge lists. Which is preferable is largely a function of the graph density and software input requirements.

a

b

c

d

Fig. 7 Different graphs from the same network: (a) the line graph dual transformation, white squares and gray lines are the original graph, black circles and dashed lines are the line graph, (b) a primal graph for a road network, (c) line graph from the same road network, and (d) named road graph from the same road network. (Source: Created by authors using synthetic data)

1746

C. Andris and D. O’Sullivan

An obvious approach, given its close relationship to spatial weight matrices, is the graph adjacency matrix A(G) = aij where aij = 1 if the edge eij exists and zero otherwise. The row order of A is unimportant, but the row and column ordering must be the same. The incidence matrix B is an alternative matrix representation that records the incidence of edges and vertices and is an n  m matrix where bij = 1 if ej is incident with vi, and zero otherwise. A useful relationship between the incidence matrix and adjacency matrix is that: BBT ¼ A

(10)

where A is the adjacency matrix, modified such that the elements in the main diagonal are equal to the degree of the corresponding vertex and BT represents the transpose of B. Another useful transformation is that the adjacency matrix A(L ) of the line graph of G is given by BT B  2Im where B is the incidence matrix of G as defined above, and Im is the m  m identity matrix. However, many spatial networks have very low densities. This makes adjacency matrices an inefficient representation because many zero entries are stored even though they record no useful information. Therefore, an adjacency list or edge list representation is often more appropriate and simply consists of a list of all the edges in the graph. Depending on the implementation details, it may also be necessary for vertices to be explicitly listed, or for the number of vertices in the graph to be stored, before the edges are listed. Appropriate modifications of such data structures can readily accommodate directed or weighted graphs. Many tools and software platforms used for the analysis of graph data make use of both matrix and adjacency list representations, and given the sparseness of many spatial networks. It is important that an efficient sparse matrix implementation be available. The edge list is usually more readily implemented in traditional GIS platforms and DBMS, as the columns can represent the attributes of the edge, i.e., magnitude of the edge weight, edge distance, etc. However, recently, graph databases (such as Neo4j) can accommodate queries on adjacency matrices, while network analysis software or packages often take either type. For many analyses, the ability to quickly convert between (dense or sparse) matrix and adjacency list representations is necessary for efficient analysis. Conversion between types is conceptually trivial, but analysis workflows require careful planning when large graphs are involved.

6

Properties of Real-World Spatial Networks

Armed with the measures and models introduced above, it is possible to investigate the properties of real-world spatial networks. This remains an active area of research in many fields, and we restrict the discussion in this section to pointing to interesting examples, and useful review materials, which enable a rapid introduction to specific fields.

Spatial Network Analysis

6.1

1747

Road Networks

Road networks are the most immediately obvious network encountered in every day life. The primal representation of a road network is where vertices are the road intersections and edges are the road segments. Across large areas of a given city, the road network approximates to a two-dimensional lattice, and the range of vertex degrees is limited by geometry: it is unusual for road junctions to connect more than five or six road segments. However, when we range across larger scales, the road hierarchy in most regions introduces shortcuts in the form of highways with limited connections to the lattice of local roads. Spatial small-world networks capture this structure relatively well. More interesting features of road networks may emerge when the primal representation is converted to the line graph dual, or when named roads are treated as the units of analysis (i.e., the vertices in the graph) and intersections between named roads are the graph edges. Either of these transformations admits greater variety in the vertex degree distribution and can enable the topologically most central roads to be identified, which may be of greater interest than more traditional approaches in some cases. Perhaps the most interesting work in this area has been recent efforts to model a variety of urban street networks using rather simple models based on the preferential attachment principle but taking spatial constraints into account (Courtat et al. 2011).

6.2

Transport Networks

Transport networks cover a wide range of modes other than road-based networks. Relative to road networks, the most obvious feature of other modes is their point-topoint or station-connection structure. These features introduce greater potential for departures from planarity, particular when airline networks and shipping routes are considered. In analysis of an extensive database of world airline routes, Guimerà et al. (2005) demonstrate that this network has many interesting properties, including small-world characteristics and distinctive scales related to regional and global service areas. This paper and others focusing on airline networks and the various findings in this area are well-covered in the spatial networks review paper by Barthélemy (2011, pp. 13–17). A comprehensive overview of developments in the analysis of all kinds of transport networks is provided by Rodrigue et al. (2009) where coverage extends beyond the more structural forms of analysis discussed in this chapter to cover how transport networks structure the regional and global economy, how they impact urban mobility, related issues of transport policy and planning. The grounding of earlier work in real-world histories provides a striking contrast with recent work in a journal special issue “Evolution of Transportation Network Infrastructure” (see Levinson 2009) where more exploratory analyses of different network growth models are highlighted.

1748

6.3

C. Andris and D. O’Sullivan

Other Spatially Embedded Networks

It is appropriate given its importance in inspiring much of the recent explosion in work on networks to point to work on the Internet and telecommunications networks in general (including e-mail, phone calls, and social media connections). This is representative of a wide range of work on infrastructure networks of all kinds. It is easy to forget that the Internet relies like other networks on physical plant of various kinds and that those considerations such as efficiency of service provision, costs of installation, and vulnerability to disruption are critical concerns for the Internet backbone as they are for other infrastructure such as electricity and water supply. A comprehensive overview of network analysis work on the Internet is provided by Pastor-Satorras and Vespignani (2004). More geographically grounded perspectives that focus attention on the spatial embedding of Internet infrastructure focus on how the structure of the Internet relates to local geographical factors and to other infrastructure networks, often showing that places that are well connected by airlines, roads, and other systems tend also to be well-provided with the Internet connectivity (Malecki 2002). Once again, the interplay between exploratory analysis of overall structure and more grounded approaches is critical to progress in understanding this field. Finally, we briefly consider spatially embedded social networks, perhaps the fundamental building block of all the other networks considered. An excellent overview of how space and social networks may be mutually reinforcing and how these effects can be modelled is provided by Butts and Acton (2011). They strongly argue for the benefits of analysis that attends to both network aspects and spatial aspects. Among the most promising areas for future development in this field are coevolutionary networks where network structures and the attributes of nodes and edges mutually influence one another over time, and the wide-ranging study of how processes such as disease spread or the diffusion of ideas occur on networks (see Newman 2010, pp. 627–676).

6.4

Network Routing

Today, location-based services (LBS) have led to new research on benefits and costs of routing algorithms. Network edges are being given new numerical variables of cost and benefit beyond trip duration and length. An example is the ability to visualize and follow real-time traffic from smart phone location aggregation services, such as Google Maps. Other examples raise ethical questions. For example, a pedestrian routing patent was created to automatically route users around places deemed “dangerous,” per an undisclosed mixture of crime and demographic data (Thatcher 2013). The flexible redefinition of the meaning of navigation and routing made possible by these methods holds great potential, although many applications also raise questions about new forms of social control often unwittingly granted such services by their users.

Spatial Network Analysis

7

1749

Conclusions

Many, perhaps most, features of the human world can be considered to be embedded in space and networked to one another at various spatial scales either (more or less) permanently or in constantly changing ways. This chapter has deliberately focused on basic concepts and models that are useful for the analysis of such networks, particularly emphasizing the rapid growth of ideas in the recently emerged “science of networks.” While much of this material has been developed in statistical physics and allied fields, it is apparent that the insights yielded by these approaches build on much earlier work on networks in geography and regional science, extending it and applying fundamental ideas to larger and more dynamic networks than before. Even so, claims that work in these areas heralds a new dawn for the social sciences seem overstated. On the contrary, it is probable that the best and most insightful work will continue to demand the application of measures, methods, and models from network science reviewed here, in combination with detailed, well-grounded empirical research on the development and structure of networks in specific contexts in space and time.

8

Cross-References

▶ Complexity and Spatial Networks ▶ Social Network Analysis ▶ Supply Chains and Transportation Networks

References Albert R, Jeong H, Barabási AL (1999) Diameter of the world-wide web. Nature 401 (6749):130–131 Andris C, Liu X, Ferreira J Jr (2018) Challenges for social flows. Comput Environ Urban Syst 70:197–207 Barthélemy M (2011) Spatial networks. Phys Rep 499(1):1–101 Bergmann L, O’Sullivan D (2018) Reimagining GIScience for relational spaces. Can Geogr/Le Géographe Canadien 62(1):7–14 Black W (2003) Transportation: a geographical analysis. Guilford Press, New York Butts CT, Acton RM (2011) Spatial modeling of social networks. In: Nyerges T, Couclelis H, McMaster R (eds) The Sage handbook of GIS and society research. SAGE, Los Angeles, pp 222–250 Castells M (1996) The rise of the network society. Blackwell, Malden Courtat T, Gloaguen C, Douady S (2011) Mathematics and morphogenesis of cities: a geometrical approach. Phys Rev E 83(3):036106 Dash Nelson G, Rae A (2016) An economic geography of the United States: From commutes to megaregions. PLoS One 11(11):e0166083 Dijkstra EW (1959) A note on two problems in connexion with graphs. Numer Math 1(1):269–271 Erdös P, Rényi A (1960) On the evolution of random graphs. Publ Math Inst Hung Acad Sci 5:17–61

1750

C. Andris and D. O’Sullivan

Fischer MM (2004) GIS and network analysis. In: Hensher DA, Button KJ, Haynes KE, Stopher PR (eds) Handbook of transport geography and spatial systems, Volume 5 of Handbooks in transport. Elsevier Ltd., Kidlington, pp 391–408 Fischer MM, Wang J (2011) Spatial data analysis: models, methods and techniques. Springer Science & Business Media, Heidelberg Fortunato S, Hric D (2016) Community detection in networks: a user guide. Phys Rep 659:1–44 Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci U S A 99(12):7821–7826 Gross JL, Yellen J (2006) Graph theory and its applications. Discrete mathematics and its applications. Chapman & Hall/CRC, Boca Raton Guimerà R, Mossa S, Turtschi A, Amaral LN (2005) The worldwide air transportation network: anomalous centrality, community structure, and cities’ global roles. Proc Natl Acad Sci U S A 102(22):7794–7799 Haggett P, Chorley RJ (1969) Network analysis in geography. Edward Arnold, London Kansky K (1963) Structure of transportation networks: relationships between network geometry and regional characteristics. PhD thesis, Department of Geography, University of Chicago Levinson D (2009) Introduction to the special issue on the evolution of transportation network infrastructure. Netw Spat Econ 9(3):289–290 Lorrain F, White HC (1971) Structural equivalence of individuals in social networks. J Math Sociol 1(1):49–80 Malecki EJ (2002) The economic geography of the internet’s infrastructure. Econ Geogr 78(4):399–424 Newman M (2010) Networks: an introduction. Oxford University Press, Oxford Okabe A, Boots B, Sugihara K, Chiu SN (2000) Spatial tessellations: concepts and applications of Voronoi diagrams, 2nd edn. Wiley, Chichester Oliver D (2016) Spatial network data: concepts and techniques for summarization. Springer Nature, Basel Pastor-Satorras R, Vespignani A (2004) Evolution and structure of the internet: a statistical physics approach. Cambridge University Press, Cambridge, UK Rodrigue JP, Comtois C, Slack B (2009) The geography of transport systems, 2nd edn. Routledge, London Sobolevsky S, Szell M, Campari R, Couronné T, Smoreda Z, Ratti C (2013) Delineating geographical regions with networks of human interactions in an extensive set of countries. PLoS One 8(12):e81707 Taylor PJ, Ni P, Derudder B, Hoyler M, Huang J, Witlox FE (2012) Global urban analysis: a survey of cities in globalization. Earthscan, London/Washington, DC Thatcher J (2013) Avoiding the ghetto through hope and fear: An analysis of immanent technology using ideal types. GeoJournal 78(6):967–980 Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge University Press, Cambridge, UK Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393 (6684):440–442 Weiser M (1991) The computer for the twenty-first century. Sci Am 265(3):94–104 Wilson RJ (1996) Introduction to graph theory. Longman, Harlow

Cellular Automata and Agent-Based Models Keith C. Clarke

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Complexity and Models of Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Cellular Automata: Origins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Cellular Automata: Key Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Cellular Automata: Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Agent-Based Models: Origins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Agent-Based Models: Key Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Agent-Based Models: Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1752 1753 1754 1755 1758 1759 1761 1762 1763 1765

Abstract

Two classes of models that have made major breakthroughs in regional science in the last two decades are cellular automata (CA) and agent-based models (ABM). These are both complex systems approaches and are built on creating microscale elemental agents and actions that, when permuted over time and in space, result in forms of aggregate behavior that are not achievable by other forms of modeling. For each type of model, the origins are explored, as are the key contributions and applications of the models and the software used. While CA and ABM share a heritage in complexity science and many properties, nevertheless each has its own most suitable application domains. Some practical examples of each model type are listed and key further information sources referenced. In spite of issues of data input, calibration, and validation, both modeling methods have significantly advanced the role of modeling and simulation in geography

K. C. Clarke (*) Department of Geography, University of California, Santa Barbara, Santa Barbara, CA, USA e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_63

1751

1752

K. C. Clarke

and regional science and gone a long way toward making models more accountable and more meaningful at the base level. Keywords

Cellular automaton · Residential segregation · Geographic information system · Regional science

1

Introduction

Models are simplifications of real-world systems that are amenable to tests and simulations of the reactions of the real systems to changes in their state and function. For extant and complicated regional systems, such as the United States Interstate Highway system, experiments on society would be unacceptable (closing highways to measure traveler delays, for example) – yet the computer allows such experiments in silico. Models, of course, are only of value if their structures are based on knowledge or data about an actual system and if they give results which are reasonable and credible. Foremost among the challenges of modeling is the finetuning of models so that they achieve the best results (calibration), of meaningfully converting a system’s components into structural and behavioral equivalents within the model (design), of the model’s effective use of computing power (tractability), of the ability to match actual or expected results (performance), and of their ability to create accurate predictions (validity). Regional science has employed a large number of modeling approaches over time, yet in the last three decades, two paradigms of modeling have emerged that have made achievements against these challenges and that have led to breakthroughs in model performance and accuracy for regional systems. These two approaches are cellular automata (CA) models and agent-based models (ABM). In this chapter, we examine these two modeling approaches. Both have been termed “individual-based modeling approaches” in ecology, and this reflects the fact that both types of models are bottom-up – that is, they model the primitive or elemental level of behavior associated with a system. Aggregate patterns are achieved by summing the results of many individual actions, which has led to the related terms “disaggregated models” and “micro-simulation models.” Both these approaches are similar in that they are simple, easy to program and implement, and use an iterative approach. Both require initial conditions to be set and have challenges around calibration procedures. Cellular models are preferred when geographic space can be represented in the form of a geographic grid, such as the cells in a raster Geographic Information System. They are also favored when model states and the probabilities of transitions among those states are known and stable. They are most suitable for dissipative processes, such as land use change and urban growth. On the other hand, agent-based models are superior when the basis of a model is a behavioral unit, such as a person, household, business, landholder, or farmer (the “agent”), and when the modeled process consists of interactions over

Cellular Automata and Agent-Based Models

1753

time among one or more types of agents that produce a spatial form, such as land use, crop choice, or habitat type. It has been said that the two modeling forms differ only in the fact that in CA the agents remain in place and interact only with their neighbors. This statement, however, ignores both major and subtle differences between the two modeling approaches, such as their means of calibration and validation. We return to this contrast in the concluding section.

2

Complexity and Models of Complexity

Complex systems theory was originally developed in physics and has origins in Lorenz’s work on weather forecasting, which in turn reflect chaos theory and work on the three-body problem by Poincaré in 1890. Initially, Lorenz observed that a system’s behavior in the long term reflects the initial conditions of the system, such as the locus of an attraction point being a function of where a point subject to the attraction started its path. The values of variables that separate different system behaviors are called thresholds, and crossing them leads to nondeterministic and nonlinear behavior. Complexity is that behavior phase which is neither static nor deterministic. An early demonstration of complexity was in sand piles. When sand is poured from a nozzle, it forms a pile, which grows in a simple linear fashion. However, at some point in its growth, the sides of the pile are subject to failure. Even though the exact failure gradient is known, it is impossible to tell when a failure will take place and how much of the sand pile it will take down. Such behavior has been called self-organized criticality. As chaos and complexity theory became more known, largely due to the Santa Fe Institute and the work of scholars like Murray Gell-Mann and John Holland, applications in many different fields became commonplace. Complexity has a natural link to the science of fractals and self-similarity, as noted by Batty (2000). Many of the fields that adopted the complex systems approach were related to physical geography, such as meteorology, fire modeling, and ecological succession. However, Batty and Longley’s (1994) demonstration of the fractal nature of cities led to some degree of acceptance within urban and human geography. Many systems in human geography exhibit complexity, including land use change, residential segregation, urban growth, road network growth, and intercity interactions. Important concepts in complexity theory are that dynamical systems – those subject to feedbacks – exist in three aggregate states or phases: chaos, stability, and complexity. In chaos, no discernable rules, structures, or even heuristics apply, such as in the business cycle or the stock market. In stability, behavior is linear or can be modeled by polynomials, that is, the change is differentiable and solvable with differential equations, equilibrium theory, and optimization. Complexity, however, is marked by periods (time) or subregions (space) of both stability and chaos. A system can move from one aggregate behavior state to another (a phase change), but each behavior type is robust (resilient) against perturbation to some degree (Waldrop 1993). Tipping a system beyond a threshold provokes a phase change, and the system then trends away from the original state. An example often used is a lake,

1754

K. C. Clarke

which is subject to inputs of phosphates. The ecosystem of the lake is able to counter the impact of the phosphates up to a certain concentration. Beyond that, even by a fraction, the lake cannot return to its initial state, and eutrophication takes place, leading to a new ecosystem based on the higher phosphate levels and different plant and animal species. An unknown, possibly large, proportion of human and natural systems exhibit such complexity. The attraction of both cellular automata and agent based models is they represent some of the simplest frameworks possible for demonstrating complex systems behavior. Largely for this reason, the models were quickly adopted and used to test many new types of urban and economic systems models. John Holland has suggested a defining condition for identifying complex systems and complexity, which he has termed emergence (Holland 1998). Emergence has been criticized as too subjective a criterion by which to identify complexity but is said to exist in a system when new and unpredicted patterns or global-level structures arise as a direct result of local-level procedures. The structure or pattern that emerges cannot be understood or predicted from the programmed or assumed behavior of the individual units alone. An example of emergence in CA is the glider (see section “Cellular Automata: Origins”). An example in the SLEUTH CA urban model (Clarke et al. 2007) is the aggregation of new settlements at the junctions of roads, a behavior nowhere inherent in the model’s programmed behavior.

3

Cellular Automata: Origins

A cellular automaton (plural cellular automata) is a discrete model originally theoretical, but now implemented in disciplines from physics to biology, geography to ecology, and computer science to regional science. CA have been defined as “discrete spatio-temporal dynamic systems based on local rules” (Miller 2009). As noted above, they are the simplest modeling framework in which complexity can be demonstrated. Using CA, extremely complex behavior and emergence can be demonstrated with terse conditions and minimal rules. They are inherently attractive as spatial models because they map closely onto the raster grid in a geographic information system, because they use only local interactions among cells, and because of their simplicity. Nevertheless, they are capable of modeling and simulating extraordinarily complex behavior (Batty 2000) and of demonstrating emergence. A CA has four elements: (i) a grid of cells, each of which can assume a finite number of states; (ii) a neighborhood, over which a change operator applies, usually the Moore (8-cell) neighborhood surrounding a cell in the grid; (iii) a set of initial conditions, that is, an instance of the states for each and every cell in the system; and (iv) one or more rules, which when applied change the state of a cell based on properties or states of the neighborhood cells. The model advances by applying the rules to every cell one at a time, then swapping the changed grid with the initial grid, and by repeating this procedure. CA were invented by Stanislaw Ulam, while he was employed at the Los Alamos National Laboratory in the 1940s. At the same time, John von Neumann was

Cellular Automata and Agent-Based Models

1755

working on the problem of self-replicating systems. Von Neumann proposed the kinematic model, a robot that could rebuild itself from spare parts. Ulam recommended that von Neumann developed his idea around a mathematical abstraction, such as the one he was using to study crystal growth on a lattice network. Like Ulam’s lattice network, von Neumann’s cellular automata used a two-dimensional grid, with his self-replicator implemented algorithmically, working within a CA with a 4-cell neighborhood and with 29 states per cell. This CA is now termed a von Neumann universal constructor. At about the same time, Norbert Wiener and Arturo Rosenblueth developed a CA model and mathematical description of impulse conduction in cardiac systems, implying broad applicability of the theory. By the 1960s, CA were being studied as a simplification of dynamical systems – models developed to simulate natural systems with feedbacks, such as air flow, turbulence, and weather, and human systems such as cities and economies. In 1969, Gustav Hedlund compiled many CA results into a seminal paper on the mathematics of CA (Hedlund 1969). Nevertheless, CA remained largely a mathematical curiosity until John Conway’s creation of a CA game, the Game of Life. Martin Gardner drew popular attention to the game in a 1970 issue of his games column in Scientific American (Gardner 1970). Life was a two-state, two-dimensional CA with only four rules: (i) Any live cell with fewer than two live neighbors dies (death), (ii) Any live cell with two or three live neighbors remains alive (survival), (iii) Any live cell with more than three live neighbors dies (overcrowding), (iv) Any dead cell with exactly three live neighbors becomes alive (birth). Despite the game’s simplicity, it can create astonishing variety in its long-term patterns. An “emergent” phenomenon is the “glider,” a cell arrangement that perpetuates itself by continuous movement across the grid (Fig. 1). It is possible to arrange the automata so that gliders interact to perform computations, and it has been proven that the Game of Life can emulate a universal Turing machine, thus completing von Neumann’s line of research.

4

Cellular Automata: Key Contributions

In the 1980s, Stephen Wolfram published a series of papers systematically investigating an unknown class of one-dimensional cellular automata, which he called elementary cellular automata (Wolfram 1986). The demonstration that what is now termed “complex systems behavior” can be simulated from the simplest of CA led to a host of explorations within the social and physical sciences into the range of what CA could simulate. Wolfram continued this work, and in 2002 published A New Kind of Science (Wolfram 2002). In the book, Wolfram argues that discoveries about cellular automata are not isolated facts but have significance for all disciplines of science. Using a one-dimensional CA, Wolfram demonstrated that virtually any mathematical function can be simulated, and he explored applications across disciplines. Wolfram proposed a four-class set of possible CA. In Class 1, nearly all patterns quickly evolve into a stable homogenous set and randomness disappears. In Class 2, nearly all patterns quickly evolve into an oscillating structure, with some

1756

K. C. Clarke

Fig. 1 A 2D cellular automaton Game of Life. Configuration shows a glider gun, a cell form that remains static and sends out streams of gliders. Gliders and glider guns are emergent behavior from the simple rules of the Game of Life

randomness remaining. In Class 3, nearly all patterns evolve into pseudo-random or chaotic structures. Any regular structures are quickly eliminated by randomness, which dissipates through the entire system. In Class 4, nearly all initial patterns evolve into structures that interact in complex and interesting ways. Wolfram has conjectured that many class 4 cellular automata are capable of universal computation. This has been proven for Conway’s Game of Life and for Wolfram’s Rule 110. Rule 110 is a unique achievement, defined as a one-dimensional CA that for the input neighboring configuration set {111, 110, 101, 100, 011, 010, 001, 000} yields the equivalent outputs {0,1,1,0,1,1,1,0}. Of the 88 possible unique elementary cellular automata, Rule 110 is the only one for which Turing completeness has been proven, making it arguably the simplest known Turing complete system. Rule 110 exhibits Class 4 behavior, which is neither completely stable nor completely chaotic. Localized structures appear and interact in various complicated-looking ways, demonstrating the properties of emergence and phase change. There have been several attempts to place CA into other formally rigorous classes, inspired by Wolfram’s classification. For instance, Culik and Yu proposed three well-defined classes (and a fourth one for the automata not matching any of these) called Culik-Yu classes. From the perspective of geocomputation, Batty (2000) surveyed the variants of CA possible for simulating urban and similar systems. He pointed out that strict CA models are on one end of a computational spectrum and that at the other end are simple Cell Space models, really no different than raster grids with a finite set of states that transition over time. He distinguished between cell space models, which are not at all CA models in the strict sense, and the concept of relaxing the CA assumptions. Key among the relaxations is the incorporation of action-at-a-distance, which is excluded by strict CA’s use of the von Neumann or Moore neighborhoods

Cellular Automata and Agent-Based Models

1757

only. CA development in modeling of urban areas and other geographical realms are covered in a literature review, and some useful information sources are listed (Batty 2000, p. 119). Beyond mathematics, CA applications have been less concerned with definitional rigor and more with making CA adjust to geographical variation. Approaches have included automatic learning methods to empirically derive rules from observed patterns, self-modification or rule changes triggered by aggregate system behavior, and the addition of “ghost” states, that fall between strict classes (e.g., “urban,” “non-urban,” and “under development” as land uses). Sante et al. (2010) note eight ways that formal CA models have been modified for use in urban growth modeling: using irregular spaces, nonuniform cell spaces, extended neighborhoods, nonstationary neighborhoods, complex transition rules, nonstationary transition rules, by adding growth constraints, and using irregular time steps. Theoretical work has also examined the synchronous versus asynchronous application of the rules. Applications of CA models in regional science have been commonplace. Most frequently, the models work on land use maps, often simplified to urban and nonurban states. Rules are derived and models calibrated using past data states, that is, by hindcasting. Land use maps derived from remotely sensed data at different time periods have commonly been used as data inputs, and other data are often zoning restrictions, transportation networks, and topography. Geographic Information Systems are used to compile and georegister the map layers, and to receive the modeling results. CA models have been applied at many scales, following the research on fractal urban forms pioneered by Michael Batty (Batty 2005), but most CA models use data at resolutions between 30 and 100 m. An early model by White and Engelen (1993) added action-at-a-distance by changing the Moore neighborhood assumption. Clarke et al. (1997) created the SLEUTH model, a CA that incorporated weighting of probabilities and self-modification, feedback from the aggregate to the local. Wu and Webster’s modeling of the rapid growth in Southern China was another significant contribution (Wu and Webster 1998). Sante et al. (2010) tabulated 33 urban CA models and compared their characteristics, and provided a useful summary of the theoretical and applied CA modeling surrounding geography, urban planning, and regional science. Silva has considered complexity theory in planning more generally, using CA as the specific example (Silva 2010). Sante et al. (2010) also offered a classification of CA transition rules. Type I rules are those of classical CA, that is, transitions can only occur based on the states of neighboring cells. Type II rules are based on potentials or probabilities altered by the land or environmental status of a cell. Type III rules are pattern development rules, which adjust the states based on shape or the existence of a network, such as roads. Type IV rules use computational intelligence methods to determine the rules from prior system behavior. Typical are Case-Based Reasoning, neural networks, data mining and kernel-based methods. Type V rules use fuzzy logic and uncertainty reasoning, while Type VI rules include those not compatible with types I–V.

1758

5

K. C. Clarke

Cellular Automata: Applications

Examples of cellular models in popular use include DINAMICA (See: www.csr. ufmg.br/dinamica), SLEUTH (See: www.ncgia.ucsb.edu/projects/gig), and Simland (Wu 1998). Influential critical reviews of CA research include those by Batty (2005), Torrens and O’Sullivan (2001), and Benenson (2007). An important early theoretical framework was that of Takeyama and Couclelis (1997), and additional attempts at synthesis have been made by Benenson and Torrens (2004) and by Torrens and Benenson (2005). Nevertheless, interest in and use of the CA suite of models continues unabated, with applications of several of the models at many scales, across regions, for whole nations, and on all continents other than Antarctica. A representative CA model that has been long-lived in relation to others is the SLEUTH model. SLEUTH is an acronym for the data input layers required by the model (Fig. 2). The model was developed by the author and a host of collaborators with funding support from the United States Geological Survey, the National Science Foundation, and the Environmental Protection Agency. There are three retrospectives on SLEUTH’s now 15 years of use (Clarke et al. 2007; Clarke 2008a, b). SLEUTH actually consists of two CA models tightly coupled together and coded within the same open source C-language program: the Clarke Urban Growth model and the Deltatron Land Use Change model. The former is a classic CA, using a Moore

Fig. 2 Input data for the SLEUTH model. Minimum layers are topographic slope, two land use maps, one exclusion map, four urban extent maps, two transportation maps, and a hill-shaded background image. Data shown are for the Environmental Protection Agency’s Mid Atlantic Regional Assessment study area

Cellular Automata and Agent-Based Models

1759

neighborhood and simple sequential rules (Fig. 3), but using weighting for probabilities, Monte Carlo simulation, and self-modification – in which aggregates, such as the overall growth rate, feedback into the parameters controlling the rule sets. The latter differs in that it takes its input of quantity of transformation from the Urban Growth model and applies CA in change space rather than geographic space. In doing so, it relaxes the single time-step rule and allows persistence and aging of cells for longer than one time step. SLEUTH has over a hundred applications at many scales and for different cities worldwide. A typical forecasting result is shown in Fig. 4.

6

Agent-Based Models: Origins

Agent-based models (ABM) are a class of computational models for simulating the actions, behavior, and interactions of autonomous individual or collective entities, with the goal of exploring the impact of one agent or a behavior type on the system as a whole. Miller (2009) notes that the agents are independent units that attempt to fulfill a set of goals. The agents can be countries, landowners, residents, renters, farmers, shoppers, vehicles, or even people out for a walk. Unlike with CA, the purpose of ABM is often the exploration of variants in system behavior due to agent characteristics (such as the proportion of agents of different types) or rules, rather than resulting aggregate structures or maps. Multiagent models include more than one agent; for example, a habitat model may include plants, animals that eat the plants, and predators that eat the animals. ABMs combine game theory, complex systems theory, evolutionary programming, and stochastic modeling. In ecology and biology, ABMs are termed “individual-based models.”

T1

T0 spontaneous

spreading center

organic

road influenced

f (slope resistance, diffusion coefficient)

f (slope resistance, breed coefficient)

f (slope resistance, spread coefficient)

f (slope resistance, diffusion coefficient, breed coefficient, road gravity)

deltatron

For i time periods (years) Fig. 3 At each cycle in the CA model, five sets of behavior rules are enforced. These are controlled by the factors and parameters shown and are applied in sequence for each one “year” iteration of the model

1760

K. C. Clarke

Fig. 4 Land use in the Santa Barbara, California, region. Top: In 1999, the base year for modeling with SLEUTH. Bottom: In 2030, as forecast by SLEUTH for the Santa Barbara Regional Impacts of Growth Study (2003). Black: unclassed; Red: urban; yellow: agriculture; orange: rangeland; green: forest; blue: water; purple: wetland; Tan: barren land

ABMs simulate the actions and interactions of multiple agents, in an attempt to emulate the overall system behavior and to predict the patterns of complex phenomena. Agents behave independently, but react to the environment, the aggregate properties of the system, and other agents. So, for example, a farmer agent in the Brazilian Amazon may clear land as he becomes more profitable, in response to a change in crop price, or because his neighbor is clearing his land. Agents are usually assumed to behave with bounded rationality, acting in their own interests such as reproducing, increasing profit, or increasing status, usually by simple heuristic rules. For example, the previously mentioned farmer may decide to have a child or build a house when profitability reaches a certain level. Agents can also “learn,” that is, avoid previously failed decisions while favoring successful ones. They can also adapt, that is, change behavior based on properties of the system. An ABM consists of (i) agents specified at specific model scales (granularity) and types; (ii) decision-making heuristics, often informed by censuses and surveys in the real world; (iii) learning or adaptive rules; (iv) a procedure for agent engagement, for example, sample, move, interact; and (v) an environment that can both influence and be impacted by the agents. Creating a model involves examining or surveying a system to extract the agents’ behavior and influential factors, quantifying these elements, then coding the model in an environment that allows control, examination of maps and time sequences, and metrics of system behavior and performance. Many ABMs are programmed in coding languages with Java being the most common, while others use one or more of the software tools, both open source and proprietary, in which the system and rules have to be specified. While there are many examples of software for ABM, relatively few of them are compatible with GIS or produce maps or images. Also of use is the ability to do Monte Carlo simulations and to let the models iterate to a steady state.

Cellular Automata and Agent-Based Models

1761

ABMs share their origins with CA in the work of von Neumann, Ulam, and Conway. A pioneering agent-based model in urban systems was Thomas Schelling’s urban residential segregation model (Schelling 1971). Though not computational, the work embodied the basic concept of agent-based models as autonomous agents interacting within a fixed environment and with an observed aggregate outcome. In the 1980s, interest in game theory led to Robert Axelrod’s experiments with the game “Prisoner’s Dilemma,” showing that strategies evolved and coevolved over time among players. Craig Reynolds’ research on models of flocking behavior yielded the first biological agent-based models with embedded social characteristics. Modeling biological agents using ABM became known as artificial life. This led to artificial societies, artificial cities, computational economics, etc. Important software tools for ABM were StarLogo, SWARM, and NetLogo in the 1990s, and since then Ascape, Repast, Anylogic, and MASON (Railsback et al. 2006). Examples of early models include Construct by Kathleen Carley and Sugarscape by Joshua Epstein and Robert Axtell. These explored the coevolution of social networks and culture and the role of social phenomena such as segregation, migration, pollution, sexual reproduction, combat, and the transmission of disease. An early book on ABM in social simulation was Nigel Gilbert’s Simulation for the Social Scientist (1999). A key research journal has been the Journal of Artificial Societies and Social Simulation.

7

Agent-Based Models: Key Contributions

A survey of the recent ABM literature is that of Niazi and Hussain (2011). A key survey in Geography was that of Parker et al. (2003), resulting from a workshop (Parker et al. 2002). Also influential was a series of papers published by the Santa Fe Institute (Gimblett 2002). Agent-based modeling has been extraordinarily interdisciplinary. ABM has been applied to model organizational behavior, logistics, consumer behavior, traffic congestion, building and stadium evacuation, epidemics, biological warfare, and population demography. In these cases, a system encodes the behavior of individual agents and their interconnections. In some geographical applications, the models have been informed by field work, interviews, or from censuses that are used to derive behavioral characteristics and choices using qualitative methods. Agent-based modeling tools are then used to test how changes in individual and collective behavior impact the system’s aggregate behavior. In some cases, agents are allowed to learn from past choices, avoiding decisions with negative outcomes, for example. The following ABM development environments include the ability to ingest, output, and use spatial data: Anylogic, Cormas, Cougaar (via OpenMap), Framsticks, Janus (using JaSIM), MASON, Repast, SeSAm, Netlogo, and VisualBots. Some of these, and other nonspatial packages, contain model libraries that include CA examples such as Game of Life, HeatBugs, demographic models, epidemiological models, and flocking. Many include the means to display charts, graphs, and maps and menus to input control variables and rules (Fig. 5).

1762

K. C. Clarke

Fig. 5 User Interface for the Anylogic 6 Agent-Based modeling software. (Source: http://www. coensys.com/agent_based_models.htm)

8

Agent-Based Models: Applications

Recent topics in regional science that have been modeled with ABMs include crowd behavior during riots and outdoor events (Torrens 2012), innovation in businesses (Spencer 2012), commuting behavior (McDonnell and Zellner 2011), ecology and habitats, disease, and land use change. The most recent research on agent-based models has demonstrated the need for combining agent-based and complex networkbased models. This has included a desire for models with reusable components, tools for proof of concept and design, descriptive agent-based modeling for developing descriptions of agent-based models by means of templates and complex networkbased models, and a need for better validation. The latter point has been repeatedly used in critiques of ABM: their very nature makes calibration and validation using data descriptive of real systems rather difficult, if not impossible. For example, a programmed behavior type will indeed emerge when enough agents are given enough time, so experiments with agents could be considered circular reasoning. Truly emergent behavior or new knowledge should be unanticipated during model construction. An example of a common ABM application is the Schelling model (Schelling 1971). This simple model of segregation, originally proposed as a game simulation, has been used as the basis of many agent-based models and theoretical discussions about

Cellular Automata and Agent-Based Models

1763

Fig. 6 Israeli and Arab residential patterns of the Israeli towns of Yaffo and Ramle from the Israeli census, and their Schelling model simulations. (Source: Hatna and Benenson 2012)

ABMs. The model illustrates how individual tendencies regarding neighbors can lead to segregation in cities. The model has been extensively used to study residential segregation of ethnic groups where agents represent householders who relocate in the city. A concise statement of the model is that by Benenson et al. (2009), in which are enumerated the six behavior rules, assuming the model to be an ensemble of agents of two types, B and W, dispersed over a grid. The rules are: (i) at every iteration, the agents can move to a vacant cell; (ii) the decision to move to a cell is based on the fraction in the neighboring cells of the opposite type of agent; (iii) this fraction should be below a tolerance proportion; (iv) if this fraction is exceeded, then an agent moves; (v) the agent searches within a finite distance for cells that are below the tolerance threshold, and if none exists, does not move; (vi) vacated cells become available for other agents. In most applications, these rules produce residential segregation, depending on the two constants that must be determined. Benenson and his colleagues have repeatedly experimented with real-world data and the Schelling model. Figure 6 shows Jewish-Arab residential patterns of the Israeli towns of Yaffo and Ramle from the Israeli census, and their ABM equivalents (Hatna and Benenson 2012).

9

Conclusion

Cellular automata and agent-based models have both represented a new approach in modeling, that of complex adaptive systems. In this approach, models are microsimulations, run at the atomic level, and aggregate behavior emerges as a

1764

K. C. Clarke

consequence of large numbers of agent interactions. The complex systems approach has favored CA and ABM over Forrester-type systems dynamics models and steadystate and equilibrium models. CA and ABM share in common their individual basis. In CA, the modeled entities are cells that remain static while spatial and other processes move across or through them. In ABM, the agents can move in space, interact with each other directly, and interact with other agent types. In both cases, a large number of independent autonomous lowest level actors create the overall landscape. The models can include extra data, such as environmental control layers, and parameters that influence the agents, such as prices or demands. CA models have been criticized as oversimplifications of reality, and those that have relaxed the rules have been criticized as not pure CA. CA are extremely sensitive to their initial conditions, and are very consumptive of CPU time. Since most use square grids, they are subject to error due to incorrect choice of map projection, and directional bias. In the long run, they are subject to equifinality arguments, since in most cases all developable land becomes developed, regardless of the exact sequence of development. They have also been criticized as difficult to implement and data hungry. Bithell et al. (2008, p. 625) have noted that ABM have the potential to create integrated models that cross disciplines, so that similar computational methods can be used to control the spatial search process, to deal with irregular boundaries, and display the changing of systems where the “preservation of heterogeneity across space and time is important.” They note that a principal challenge of ABM is to find sets of rules that best represent the beliefs and desires of human as agents, so that they reflect the cultural context, yet still allow system exploration. Clifford (2008, p. 675) noted that ABMs are most appropriate where decisions or actions are distributed around specific locations, and where structure is seen as emergent from the interaction among individuals. For this new and exploratory modeling framework, he calls for “a rediscovery and reappraisal of the richness and depth of insight in the model-building enterprise more generally.” Some have attempted to link ABM with other bodies of theory, for example, Neutens et al. (2007) have linked ABM and time-space geography. Andersson et al. (2006) have attempted to link networks, agents, and cells to model urban growth. Lastly, O’Sullivan and Hakley (2000) have suggested that using ABM encourages a modeling bias toward an individualist view of the social world, thereby missing many forces that shape real economic and human systems top down, such as planning and government. Read (2010, p. 329) noted that agent-based models “sometimes provide only a veneer of, rather than substantive engagement with, social behavior.” Both CA and ABM have enjoyed popularity in the regional modeling arena in the last 20 years due to their simplicity, ease of use and accuracy. When machine learning or optimization is involved, the models can produce simulations that are of excellent accuracy. However, the models are often only as good as the data with which they are trained or tested, and are highly sensitive to the context of these data. Relatively few CA or ABM models are highly ported across different applications. Even fewer have been rigorously tested for accuracy, repetition, and parameter sensitivity and validated using independent data. A major criticism of both model

Cellular Automata and Agent-Based Models

1765

types is that while the simulations are accurate and engaging, they lack any causative description or policy-related link to actual system behavior. Thus, while they can create useful future scenarios or forecasts, the means by which the actual system can be steered toward that outcome is not forthcoming. CA models are best used for spatially distributed process simulation, such as spread and dispersal, and when the geometry, scale, and basic behavior of a system are known. ABM is suited to simulations with no prior precedent, no past data, or when system knowledge is absent. These applications are usually more exploratory than when CA are used. Nevertheless, both modeling methods have significantly advanced the role of modeling and simulation in regional science and gone a long way toward making models more accountable and more meaningful at the base level. Their joint impact on research and understanding of human systems has been profound.

References Andersson C, Frenken K, Hcllervik A (2006) A complex network approach to urban growth. Environ Plan A 38(10):1941–1964 Batty M (2000) Geocomputation using cellular automata. In: Openshaw S, Abrahart RJ (eds) GeoComputation. Taylor and Francis, London, pp 95–126 Batty M (2005) Cities and complexity: understanding cities with cellular automata, agent-based models, and fractals. MIT Press, Cambridge, MA Batty M, Longley P (1994) Fractal cities: a geometry of form and function. Academic, San Diego/London Benenson I (2007) Warning! The scale of land-use CA is changing! Comput Environ Urban Syst 31(2):107–113 Benenson I, Torrens PM (2004) Geosimulation: automata-based modeling of urban phenomena. Wiley, New York Benenson I, Erez H, Ehud O (2009) From Schelling to spatially explicit modeling of urban ethnic and economic residential dynamics. Sociol Methods Res 37(4):463–497 Bithell M, Brasington J, Richards K (2008) Discrete-element, individual-based and agent-based models: tools for interdisciplinary enquiry in Geography? Geoforum 39(2):625–642 Clarke KC (2008a) Mapping and modelling land use change: an application of the SLEUTH model. In: Pettit C, Cartwright W, Bishop I, Lowell K, Pullar D, Duncan D (eds) Landscape analysis and visualisation: spatial models for natural resource management and planning. Springer, Berlin, pp 353–366 Clarke KC (2008b) A decade of cellular urban modeling with SLEUTH: unresolved issues and problems, Chapter 3. In: Brail RK (ed) Planning support systems for cities and regions. Lincoln Institute of Land Policy, Cambridge, MA, pp 47–60 Clarke KC, Hoppen S, Gaydos L (1997) A self-modifying cellular automaton model of historical urbanization in the San Francisco Bay area. Environ Plann B Plann Des 24(2):247–261 Clarke KC, Gazulis N, Dietzel CK, Goldstein NC (2007) A decade of SLEUTHing: lessons learned from applications of a cellular automaton land use change model, Chapter 16. In: Fisher P (ed) Classics from IJGIS. Twenty years of the international journal of geographical information systems and science. Taylor and Francis/CRC Press, Boca Raton, pp 413–425 Clifford NJ (2008) Models in geography revisited. Geoforum 39(2):675–686 Gardner M (1970) Mathematical games: the fantastic combinations of John Conway’s new solitaire game “life”. Sci Am 223(9):120–123 Gilbert N, Troitzsch KG (1999) Simulation for the Social Scientist. Milton Keynes: Open University Press.

1766

K. C. Clarke

Gimblett HR (2002) Integrating geographic information systems and agent-based modeling techniques for simulating social and ecological processes. Institute Studies in the Sciences of Complexity, Oxford University Press, Santa Fe Hatna E, Benenson I (2012) The Schelling model of ethnic residential dynamics: beyond the integrated – segregated dichotomy of patterns. J Artif Soc Soc Simul 15(1):6 Hedlund GA (1969) Endomorphisms and automorphisms of the shift dynamical system. Math Syst Theory 3(4):320–3751 Holland JK (1998) Emergence: from chaos to order. Addison-Wesley, Redwood City McDonnell S, Zellner M (2011) Moira exploring the effectiveness of bus rapid transit a prototype agent-based model of commuting behavior. Transp Policy 18(6):825–835 Miller HJ (2009) Geocomputation. In: Fotheringham AS, Rogerson PA (eds) The SAGE handbook of spatial analysis. Sage, London, pp 397–418 Neutens T, Witlox F, Van de Weghe N, De Maeyer PH (2007) Space-time opportunities for multiple agents: a constraint-based approach. Int J Geogr Inf Sci 21(10):1061–1076 Niazi M, Hussain A (2011) Agent-based computing from multi-agent systems to agent-based models: a visual survey. Springer Scientometr 89(2):479–499 O’Sullivan D, Hakley M (2000) Agent-based models and individualism: is the world agent-based? Environ Plan A 32(8):1409–1425 Parker DC, Berger T, Manson SM (2002) Agent-based models of land-use and land-cover change. LUCC report series no. 6. Indiana University, Bloomington Parker DC, Manson SM, Janssen MA, Hoffmann MJ, Deadman P (2003) Multi-agent system models for the simulation of land-use and land-cover change: a review. Ann Assoc Am Geogr 93(2):314–337 Railsback SF, Lytinen SL, Jackson SK (2006) Agent-based simulation platforms: review and development recommendations. Simulation 82(9):609–623 Read D (2010) Agent-based and multi-agent simulations: coming of age or in search of an identity? Comput Math Organ Theory 16(4):329–347. Special Issue Sante I, Garcia AM, Miranda D, Crecente R (2010) Cellular automata models for the simulation of real-world urban processes: a review and analysis. Landsc Urban Plan 96(2):108–122 Schelling T (1971) Dynamic models of segregation. J Math Sociol 1(2):143–186 Silva EA (2010) Complexity and CA, and application to metropolitan areas. In: de Roo G, Silva EA (eds) A planner’s encounter with complexity. Ashgate, Aldershot, pp 187–207 Spencer GM (2012) Creative economies of scale: an agent-based model of creativity and agglomeration. J Econ Geogr 12(1):247–271 Takeyama M, Couclelis H (1997) Map dynamics: integrating cellular automata and GIS through geo-algebra. Int J Geogr Inf Sci 11(1):73–91 Torrens PM (2012) Moving agent pedestrians through space and time. Ann Assoc Am Geogr 102(1):35–66 Torrens PM, Benenson I (2005) Geographic automata systems. Int J Geogr Inf Sci 19(4):385–412 Torrens PT, O’Sullivan D (2001) Cellular automata and urban simulation: where do we go from here? Environ Plann B Plann Des 28(2):163–168 Waldrop MM (1993) Complexity: the emerging science at the edge of order and chaos. Simon & Schuster, New York White R, Engelen G (1993) Cellular automata and fractal urban form: a cellular modeling approach to the evolution of urban land-use patterns. Environ Plan A 25(8):1175–1189 Wolfram S (ed) (1986) Theory and applications of cellular automata. World Scientific, New York Wolfram S (2002) A new kind of science. Wolfram Media, Champaign Wu F (1998) SimLand: a prototype to simulate land conversion through the integrated GIS and CA with AHP-derived transition rules. Int J Geogr Inf Sci 12(1):63–82 Wu F, Webster CJ (1998) Simulation of land development through the integration of cellular automata and multi-criteria evaluation. Environ Plann B 25(1):103–126

Spatial Microsimulation Alison J. Heppenstall and Dianna M. Smith

Contents 1 2 3 4 5

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Defining Spatial Microsimulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Static and Dynamic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Microsimulation as a Tool for Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spatial Microsimulation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Deterministic Reweighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Conditional Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Which Algorithm? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Case Study: Estimating Smoking Prevalence Locally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1768 1769 1770 1770 1772 1772 1774 1775 1776 1777 1782 1783 1783

Abstract

Spatial microsimulation is an excellent option to create estimated populations at a range of spatial scales where data may be otherwise unavailable. In this chapter, we outline three common methods of spatial microsimulation, identifying the relative strengths and weaknesses of each approach. We conclude with a worked example using deterministic reweighting to estimate tobacco smoking prevalence by neighborhood in London, UK. This illustrates how spatial microsimulation may be used to estimate not only populations but also behaviors and how this A. J. Heppenstall (*) School of Geography, University of Leeds, Leeds, UK e-mail: [email protected] D. M. Smith University of Southampton, Southampton, UK e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_65

1767

1768

A. J. Heppenstall and D. M. Smith

information may then be used to predict the outcomes of policy change at the local level. Keywords

Simulated annealing · Smoking prevalence · Constraint variable · Microsimulation model · Synthetic population

1

Introduction

Social science research increasingly aims to explore the effects of government policy on individual behavior. Although detailed data are available about individuals from disparate surveys and sources (such as health records), what is often not known/ readily available is the spatial location of these individuals within a city or country. Microsimulation can link data from multiple sources to model behavior at the individual level, to identify groups within a population who may need additional support or assistance before changing as aspect of public policy. Microsimulation may also be applied spatially to estimate small-area population effects of public policy prior to implementation. In this way, governments may be better prepared for local changes that follow rollout of a new tax or urban planning scheme. Additionally, the knowledge of how particular behaviors vary spatially can lead to more efficient allocation of services: garbage collection in areas of highest waste generation and hospitals located in proximity to areas with more traffic accidents or greater prevalence of long-term illness. The focus in this chapter is spatial, rather than individual, microsimulation. Spatial microsimulation creates a “synthetic” population (compiled from anonymized individual-level data) which realistically matches the population, as defined by the population census, in a geographical area for a given set of criteria (constraints). A diverse set of research and policy applications utilize spatial synthetic populations, including health (Brown and Harding 2002; Smith et al. 2011), transportation (Beckman et al. 1996), and water demand estimation (Williamson and Clarke 1996). The accurate representation of populations within geographical areas may have great significance for microsimulation modeling when the outputs are intended to inform policy. The US and UK Censuses collect comprehensive sociodemographic data on individuals, but to protect confidentiality, data are aggregated to larger geographic scales. At a coarse geographic scale (such as US state or county), more attribute detail is available, including cross tabulation of attributes such as tables of age-sex-ethnicity distribution of the population. At a fine geographic scale (e.g., UK output areas with average populations of 250), detailed population attributes are not available, only univariate tables of age, sex, or ethnicity. The lack of detailed data at the local level has led to research focused on creating realistic synthetic populations within a predefined geographical area, estimating combinations of attributes and/or data not available within census datasets, effectively filling in the blanks. Within this chapter, both microsimulation and spatial microsimulation are defined. Using the main application areas as examples, we highlight the strengths

Spatial Microsimulation

1769

and weaknesses of the approach. The focus then moves onto the main algorithms that are used within spatial microsimulation: deterministic reweighting, conditional probabilities, and simulated annealing. An exemplar of the application of spatial microsimulation is provided through a case study involving the estimation of smoking prevalence in local areas within London, UK. Finally, a general discussion is presented offering suggestions for areas of future development.

2

Defining Spatial Microsimulation

Microsimulation is the generation at time t = 0 of a population sample P made up of n individuals [P1, P 2, . . . , P n] where each individual, i, has a number of initial attributes [ai1 , ai2 , . . . , aim ]. The population is then updated to later times, t, so the attributes of individual i become functions of time [ai1 ðt Þ, ai2 ðt Þ, . . . , aim ðt Þ]. This allows the population to be used to model the effects of policy changes on individuals. While detailed statistics on each individual attribute are often available for the whole population (e.g., number of males/females, number of people in a given age band), information about the co-dependencies of attributes is not (e.g., how many people in the 20–30 age band are male/female), or there is no information on policyrelevant outcomes such as household expenditure on childcare, or individual-level health behavior such as physical activity or smoking. What is often available is a sample of the population which does contain all these attributes. In microsimulation, the initial population is often taken from population samples in large-scale surveys. Spatial microsimulation is a type of microsimulation that recognizes the key role of geography in many of the processes being modeled. Each respondent from a survey dataset, which includes the attributes we want to estimate for the population, is given a probability (weight) to live in a specific location or spatial area (e.g., a ward), based on the known population structure of each area from the census. As discussed above, one of the key challenges for spatial microsimulation is the creation of a realistic initial population. Often population samples are only available at the coarse spatial resolution (e.g., at country/regional level) and not at the fine spatial scale required. At the fine scale, only statistics on the individual attributes are available. Spatial microsimulation techniques therefore need to generate a realistic population in each area, j. Mathematically, the population of area j can be written as Pj = [wij P i] where the weight wij represents the number of people in the population of area j, characteristic i, with the attributes of person P i from the sample or synthetically generated population. To ensure the proportions of each individual attribute are correct P in the area,kmthe wij need to be chosen so the mean absolute error (MAE) given by km jT km j  T j ðobsÞj is minimized for each area j. Here, T km is the modeled number of people in area j with j attribute k taking the value m, and T km (obs) is the equivalent number observed. The j km km km modeled value is given by T j = i wij δi where δi takes the value of one if attribute k takes the value m for person i and zero otherwise. Ideally, the MAE would be zero for all areas, but in practice, this is not always attainable given just an initial sample of the whole population. Much of the focus of the rest of this chapter will be discussing methods for obtaining the weights wij.

1770

3

A. J. Heppenstall and D. M. Smith

Static and Dynamic

Microsimulation models are categorized as either static or dynamic. In static microsimulation, a large representative sample has rules (normally drawn from data analysis or the literature) applied to it to generate the synthetic demographic and economic characteristics expected at one point in time. Spatial population simulations are focused on what the consequences of external information bring to the population; it does not model the changes in the population itself. The defining characteristic of static microsimulation is that there is no direct change of the individuals within the model during the simulation time period, instead we focus on adding attributes to the existing population dataset. A typical “what-if?” scenario would be: “If there had been no poll tax in 1991, which communities would have benefited most and which would have had to have paid more tax in other forms?” (See Ballas et al. 2005; Gilbert and Troitzsch 2005). Further examples of static microsimulation in this area are presented in Table 1. In dynamic microsimulation, individuals change their characteristics as a result of endogenous factors within the model; the populations update over time. Various degrees of direct interaction between micropopulation units can be found in dynamic microsimulations, for example, processes such as birth and marriage. These models rely on accurate knowledge of the individuals and the dynamics of such interactions. In a dynamic microsimulation, the updating of the dynamic structure is performed by “aging” the population through the application of transition probabilities (i.e., what is the probability of an individual getting married or having a baby?). The changes in the population itself are modeled with changes in an individual in one year having an effect on the characteristics in subsequent years. A typical future-oriented dynamic microsimulation scenario would be as follows: “If the current government had raised income taxes in 1997, what would the redistributive effects have been between different socioeconomic groups and between central cities and their suburbs by 2011?” (O’Donoghue 2001; Ballas et al. 2005; Gilbert and Troitzsch 2005). Static and dynamic microsimulation each has benefits and drawbacks. Static models tend to be simpler programs than dynamic models, and because they are less computationally demanding, simulations can be run quickly. There is a general acceptance that dynamic models provide a more realistic long-term estimate of individual-level behavior (O’Donoghue 2001). However, the process of generating realistic behavior involves potentially unlimited interactions/interdependencies of the individuals when updating; this can result in dynamic microsimulations being computationally demanding (Ballas et al. 2005).

4

Microsimulation as a Tool for Policy

One of the most important advantages of microsimulation is that it enables us to examine the impact of policy changes on individuals even with sparse data. This distinguishes microsimulation from alternative methods, such as Bayesian estimation (Congdon 2006) (for details of general Bayesian methods, see Brunsdon, this

Spatial Microsimulation

1771

Table 1 Name and description of examples of microsimulation models within the four main domain areas. (Adapted from Birkin and Wu 2012) Model name and domain (a) Tax benefits POLIMOD

Origin

Description and example applications

UK

Demonstrates how VAT, National Insurance Contributions, and local taxes are calculated under different assumptions; entitlement to retirement pension and other non-means-tested social security benefits Static microsimulation model of the tax and transfer systems. The rules of government programs are applied to individuals and aggregated to calculate outcomes for income units, families, or households Tax-benefit model that covers 15 countries. It provides estimates of the distributional impact of changes to personal tax and transfer policy at either the national or the European level

STINMOD

Australia

EUROMOD

Europe

(b) Pensions PRISM

UK

SfB3

Germany

DYNACAN

Canada

(c) Health care PBS

Australia

LifeMOD

UK

LifePaths

Canada

DRACULA

UK

Paramics

US

VisSim

Germany

Dynamic microsimulation of income from social security, earnings, assets, public and private occupational pensions, and retirement savings plans Analysis of pension reforms, the effect of shortening worker hours, distributional effects of education transfers, and interpersonal redistribution in the state pension system Projects the incidence, average levels, and variation in private pensions into the future as a function of birth year, age, and gender Expenditure on pharmaceuticals by different types of households, resultant government outlays under the Pharmaceutical Benefits Scheme, and the remaining patient co-payment contributions Model the lifetime impact of the welfare state through examination of health status over the life course and implications for health-care financing in the UK A dynamic longitudinal microsimulation model of individuals and families which simulates the discrete events that together constitute an individual’s life history Simulate response of traffic to different network layouts and control strategies; measure network performance from outputs of the average travel time, speed, queue length, fuel consumption, and pollutant emission Microscopic simulation of a range of real-world traffic and transportation problems handling scenarios ranging from a single intersection to a congested freeway or the modeling of an entire city’s traffic system Models traffic flow in urban areas as a discrete, stochastic, time step-based microscopic model, with driver-vehicle-units as single entities. The model contains a psychophysical car following model for longitudinal vehicle movement and a rulebased algorithm for lateral movements (lane changing)

1772

A. J. Heppenstall and D. M. Smith

Table 2 Name and description of examples of spatial microsimulation models Model SVERIGE

Origin Sweden

SimBritain

UK

HYDRA

UK

SMILE

UK

Description Dynamic population model designed to study human eco-dynamics. Simulates spatial location and mobility of individuals. Developed for Sweden Dynamic simulation attempting to model British population at different geographical scales up to the year 2021 Grid-enabled decision-making support system for health service provision Dynamic spatial microsimulation model designed to analyze the impact of policy change and economic development on rural areas in Ireland

volume) or multilevel modeling for small-area population estimation (Moon et al. 2007). These two approaches to small-area population estimation require crosstabulated data at the spatial scale of the simulation output, limiting the scale of any simulations. The key advantage of microsimulation is that it has no such requirement and may be carried out from a series of univariate tables through an iterative process of recalculating weights. Spatial microsimulation has further advantage over other microsimulation models in its ability to explore spatial relationships and analysis of the spatial implications of policy scenarios. These advantages are reflected in the different applications where microsimulation has been applied. Table 1 presents an overview of the main subject domains and models that microsimulation can be found within, while Table 2 presents examples of spatial microsimulation models. Many of the models in Tables 1 and 2 share a common feature; they are concerned with the idea of “what-if?” simulations whereby the impact of new or alternative policy rules on the whole system or individual parts/components can be assessed. A simple example might be, “What would happen to the economic situation of local households if there is a change in child benefit tax?”

5

Spatial Microsimulation Algorithms

There are several established methods used for spatial microsimulation, specifically: deterministic reweighting (Smith et al. 2011), conditional probability (Monte Carlo simulation) (Birkin and Clarke 1988), and simulated annealing (Openshaw 1995; Voas and Williamson 2001). These methods were selected due to their common application in geography (see Voas and Williamson 2001; Ballas et al. 2005). Further details of each of these algorithms, including mathematical derivations, can be found in Harland et al. (2012).

5.1

Deterministic Reweighting

The deterministic reweighting algorithm was introduced by Ballas et al. (2005) and has been widely used in microsimulation models for health-care research.

Spatial Microsimulation

1773

As described in Smith et al. (2011), the deterministic reweighting algorithm proportionally fits each individual record in the sample to the observed counts in each of the constraint tables (univariate tables which define the “known” population within an area based on a set of attributes) iteratively until each of the constraint variables have been included. Each iteration of reweighting first applies Eq. (1) to calculate an initial new weight which is subsequently reweighted to reflect the observed population count in the subsequent constraint tables using Eq. (2) and then a scaling factor is included in Eq. (3): nwco ¼ oldwt c NW co ¼

X

tot co tot cs

nwco

tot co nwc NW co o

(1) (2) (3)

where nwo denotes new weight for individual in area o; NWo total weight for all individuals in area o; c is constraint subcategory, such as “male” in the gender constraint table; tot co total population in constraint c in area o defined by a population census; tot cs total population in constraint c in survey data; and oldwtc is initial starting weight for individual, based on predefined survey weights, or may be set to equal one. For each zone, the sample is cloned with the initial starting weights and then reweighted using the procedure outlined above. Each time an individual is “cloned” into an area, the additional attributes associated with them beyond the constraints are also included. Each person in the survey is given a weight, or probability, of living in each area based on the census constraint tables for each area, and this weight adjusts as the algorithm cycles through the constraints. Through this process, a more extensive population profile is made available. The local population profile will include decimal values of the weights, so outcomes are expressed as proportions of the base population rather than individuals replicated with attributes. Perfect matching between the synthetic totals and sample totals from the reweighting algorithm cannot be reached for every zone. The more dissimilar the characteristics of a zone are from the distribution of characteristics in the sample, the greater the resulting error, as this method assumes all of the areas are relatively homogeneous. A zone with a high ethnic minority population will differ to the average distribution of ethnic minority groups in the sample, making such a zone less likely to match the constraint table perfectly. Therefore, despite this algorithm having no stochastic element and being completely deterministic, the order in which constraints are applied can produce different resulting populations as each new weight produced is a product of the weight calculated using the preceding constraint information. One way to adapt this algorithm so the estimates are more accurate is to group together similar areas and run them through the model together (Smith et al. 2011). This is illustrated in the example at the end of this chapter.

1774

5.2

A. J. Heppenstall and D. M. Smith

Conditional Probabilities

The conditional probabilities model is an adaptation of the synthetic estimation procedures first introduced by Birkin and Clarke (1988). It was originally designed to generate a synthetic population where no survey data existed. With the availability of more survey information, the algorithm has evolved to execute equally well using a sample. The algorithm initiates by creating a population with the characteristics of the first constraint and the associated probability calculated from the constraint table. For example, the first constraint is gender and the number of males and females in the first geographical zone are 120 males and 180 females. Therefore, 120 individuals are created with an associated characteristic of male, and a further 180 individuals are created with a characteristic of female. The second constraint is marital status with three categories: married, single, and divorced/widowed. The probability of married, single, and divorced/widowed people appearing in the zone is first derived from the sample. It is simply the count of individuals in each category from the second constraint, divided by the total number of individual records appearing in the first constraint. If the sample contained 1,000 records of which 400 were male, of the males 160 were married, the joint probability of a male being married would be 160/400 = 0.4. The remaining joint probabilities are calculated for the male sample and result in being male and married, p(male, married), is 0.4, p(male, single) is 0.4, and p(male, divorced/widowed) is 0.2. The male portion of the synthetic population is iterated through, with a random number greater than zero and less than or equal to one generated for each individual. If the random number is less than 0.4, a characteristic of married is added to that individual; if the random number is between 0.4 and 0.8, a characteristic of single is added to the individual; and finally, if the random number is greater than 0.8, a characteristic of divorced/widowed is added to the individual. This process is repeated for the female category from the first constraint. Once all individuals in the current zone have been assigned the second constraint characteristics, the totals for each of the three categories are calculated and compared to the totals observed in the marital status constraint table. The initial starting probabilities are adjusted to represent the discrepancies between the observed constraint totals and those calculated from the synthetic population. For example, if 100 married people were expected in this zone but only 80 had been created, the initial male probability of being married, p(male, married), of 0.4 would be boosted using Eq. (4), as would the corresponding female probability, p(female, married): npc ðx, y, z, . . . , nÞ ¼ opc ðx, y, z, . . . , nÞ

T constraint c T synthetic c

(4)

where npc (x, y, z, . . . , n) is the new joint probability calculated and opc (x, y, z, . . . , n) the old or initial joint probability. T constraint is the total number of individuals in c category c of the current constraint, and T synthetic the total number of individuals in c category c of the synthetic population.

Spatial Microsimulation

1775

For the male population, Eq. (4) would become 0:5 ¼ 0:4 100 80 giving a new p(male, married) of 0.5. Each joint probability for the second constraint is adjusted and the characteristic assignment process repeated. This process is iterated through until changes in the joint probabilities fall below a predefined threshold. Each constraint is added individually using this technique until all the required constraints have been incorporated, and then the next zone is calculated starting from the initialization of the synthetic population using the values from constraint one. Once this process is complete, the synthetic population can be saved if no additional attributes from the sample are required. However, if the researcher is interested in examining attributes from the sample not included in the constraints, usually because they are not available, then an additional final Monte Carlo sampling stage is required. For each individual in the synthetic population, the sample is entered at a random point and iterated through until the first record exactly matches the constraint attributes generated for the synthesized individual. When a matching record in the sample has been found, the extra attributes are copied to the synthetic record.

5.3

Simulated Annealing

As outlined by Davies (p. 6, 1987), “Simulated annealing is a stochastic computational technique derived from statistical mechanics for finding near globally-minimum-cost solutions to large optimisation problems.” The essence of the procedure is to start by creating a population as a random extract from the sample file, and by aggregating for the various constraints, the goodness of fit of the population to the constraining tables can be evaluated. From this population, an individual member is selected at random and replaced with another individual that is also selected at random from the sample. The aggregation and goodness of fit evaluation is repeated, and if the fit is improved, then the new individual replaces the old. The feature which distinguishes simulated annealing, for example, in contrast to hill-climbing algorithms, is the incorporation of the Metropolis algorithm allowing both backward and forward steps to be taken when searching for an optimal solution (Otten and van Ginneken 1989). So even if the replacement leads to deterioration in the model fit, it will be allowed by the model as long as a certain threshold is exceeded. This threshold is often characterized as a “temperature” step – or annealing factor – as this method was originally conceived as a means to simulate the annealing process by which metals are cooled. As the algorithm proceeds, the (temperature) thresholds are reduced, and so, backward steps become progressively more unlikely, so that ultimately only climbing moves are permitted toward an optimized outcome. Simulated annealing is similar to deterministic reweighting to the extent that weights are applied to members of a sample. However, in simulated annealing, these weights are zero or one representing selection or exclusion, whereas in deterministic reweighting, the weights are fractional. Simulated annealing is a heuristic hill-

1776

A. J. Heppenstall and D. M. Smith

climbing algorithm rather than an iterative process (deterministic reweighting) or sequential estimation method (conditional probabilities). One of the most important differences is that simulated annealing evaluates individual moves simultaneously against all of the constraining tables, whereas in both of the other techniques, this evaluation takes place constraint by constraint.

6

Which Algorithm?

Each spatial microsimulation algorithm/method possesses its own inherent advantages and disadvantages. The following section evaluates the strengths and weaknesses of each method in the context of issues to consider before selection of an algorithm. First, how much preprocessing is required to obtain a robust fit-for-purpose output? As both the deterministic reweighting and conditional probabilities algorithms reweight the resulting populations using one constraint at a time, building on the results from the previous constraint, they are sensitive to the constraint order specified within the model; the first constraint will have the greatest impact on the final “weight” assigned to an individual to live in an area. The primary difference between deterministic reweighting and conditional probabilities is the lack of a stochastic process in deterministic reweighting. As the deterministic process will give the same result every time, the impact of slight modification to the constraint variables will be clear from the results. To model the prevalence of type 2 diabetes using a deterministic method and estimate the impact of an aging population on diabetes prevalence, the age constraint may be adapted to reflect an aging population. The model is then re-run to estimate diabetes prevalence under this aged population distribution. In contrast, the simulated annealing algorithm places equal weight on each constraint (although this can be changed at the researcher’s discretion), and thus, the constraint order is of no consequence to the outputs. The number of constraints that can be specified is related to the speed of execution of each algorithm. The deterministic reweighting algorithm reweights the sample using all of the constraint information in a specified order. As more constraints are added, the difference in sample and constraint frequency distributions can become more pronounced, especially at finer geographies where constraint populations are small. Therefore, less robust results may be produced as additional constraints are included in the model. The conditional probabilities model suffers from similar issues; however, these are less pronounced due to the joint probabilities for constraint combinations being adjusted in isolation. However, increasing the number of constraints increases both the processing time and the likelihood of being unable to converge on a suitable joint probability for a constraint combination. Simulated annealing also suffers from a time performance penalty as the number of constraints is increased although the rate increase is less severe than for the other two algorithms. One of the major advantages that the conditional probabilities method has over the other two algorithms is that if a sample is unavailable, a synthetic population can be created using only the aggregate information from the constraint tables. However, as the algorithm here requires a sample from which to extract the initial joint

Spatial Microsimulation

1777

probabilities, an alternative source for this information is required, in the absence of a sample otherwise erroneous individuals, such as married children, could be produced. A major advantage of the simulated annealing approach, as discussed above, is the inclusion of the Metropolis algorithm. However, the drawback to added search power is the associated higher computational times. Neither the conditional probabilities method nor the deterministic reweighting method can take backward steps when searching for a solution. Finally, both the conditional probabilities and the simulated annealing methods contain a stochastic element that results in the creation of a different population configuration each time the model is run. This allows the model to consider and produce alternative and potentially more realistic populations. Deterministic reweighting does not have this capability. However, because deterministic reweighting produces the same result with each model run, the impacts of any starting constraint change are more easily quantifiable and can be important for policy evaluation, as discussed above. There is only one example of these methods being compared and systematically evaluated within the published literature. Harland et al. (2012) compared the outputs of spatial microsimulation algorithms at varying spatial scales. While simulated annealing performed very well in each of the experiments that were performed, no clear winner was advocated. An interesting finding was that the simulation of attributes that are particularly influenced by spatial locations is best undertaken using the deterministic reweighting algorithm. This method allows the constraint order to be tailored to best represent a particular cluster of zones and is more accurate when the purpose is to model one distinct outcome rather than recreating a generic population (Smith et al. 2011). The following case study uses the deterministic reweighting algorithm to estimate one such outcome, estimation of smoking prevalence in London.

7

Case Study: Estimating Smoking Prevalence Locally

The application of spatial microsimulation to health outcomes is a valuable extension to the earlier models of tax policy and population estimation. Local-level public health data are more readily available due to electronic patient records and regular data collection as part of the funding scheme in the UK; however, these data only reflect individuals who are registered with a general practice or visit the hospital. Patient records are protected by strict information governance, and patient right to privacy or confidentiality prevents widespread access to or use of records which may inadvertently identify an individual. Within the UK data, governance prohibits most spatial mapping of individual health data below the Lower Super Output Area level [=LSOA] (about 1,500 individuals). Even when the data exists, it may not be used for spatial analysis if there is risk of re-identifying patients when their location is combined with demographic characteristics. Small-area estimation of health outcomes/behaviors offers a solution to restricted spatial analysis and can be accomplished by combining detailed health surveys with population census data with spatial microsimulation.

1778

A. J. Heppenstall and D. M. Smith

Small-area estimation of health data may be carried out using a variety of statistical methods and frameworks. Here we focus on spatial microsimulation using a deterministic reweighting algorithm, although as mentioned previously, there are numerous alternatives including multilevel modeling (Moon et al. 2007) and Bayesian methods (Congdon 2006). In the example to follow, we will show that the microsimulation algorithm requires only univariate population data tables at the geographic level of interest. The population census restricts multivariate (crosstabulated) tables at lower levels with the aim of protecting identity. For this reason, spatial microsimulation has been used to estimate health outcomes down to the output area level, which typically includes about 150 individuals. If we knew how smoking prevalence varied spatially, we would be able to allocate smoking cessation services and associated resources more effectively. There are few complete population health censuses available beyond local bespoke surveys. More frequently, countries will have national-level health surveys (such as the Health Survey for England or New Zealand Health Survey) repeated either annually or every set number of years. These national surveys are conducted with a representative sample of the population and often include hundreds of variables to provide comprehensive profiles of respondents’ health. As part of the data collection process, the surveys will also collect information on basic social and demographic characteristics of respondents. This data allows respondents from the health survey to be “linked” to people from the census who share the same demographic traits. This process will be illustrated in an example of smoking prevalence estimation among adults in London (2001 Census population: 7,172,090). Data to produce the smoking estimates will come from the 2008 Health Survey for England [=HSE] (n = 12,648 respondents). In this example, we will use logistic regression to identify the social and demographic variables present in both the HSE and the 2001 Census that are best predictors of cigarette smoking among adults in England. There must be consistency in the variables between both datasets: the census provides the spatially defined population at the output area level, and the HSE gives us the probability of an individual who fits a given social and demographic profile to be a smoker. For the sake of simplicity, this example will be limited to four predictor variables (constraints). The logistic regression model is run using SPSS 18.0. A series of potential predictor variables are identified from literature on smoking behavior: age, sex, ethnicity, social grade, marital status, employment status, and housing tenure. Smith et al. (2011) validated this method to predict smoking accurately in New Zealand using age, sex, ethnicity, and income data from the New Zealand Health Survey and the New Zealand Census of Population. After running a series of logistic regression models and comparing model fit, the four best predictors of smoking status from the HSE were identified as age, sex, marital status, and social grade. The constraints variables are categorical (Table 3). There are a total of 4,765 LSOAs in London. The constraint tables are created with the total counts of individuals in each LSOA, which must sum to the same total population in each LSOA for each constraint variable. This must be checked carefully as some people will not answer all of the questions in the census. To ensure that there are the same total counts for each LSOA in each constraint, the numbers

Spatial Microsimulation

1779

Table 3 Constraints in the smoking model 2008 HSE data Constraint Sex Age

Social grade

Marital status

Male Female 0–17 18–24 25–34 35–44 45–54 55–64 65+ A, B C1, C2 D, E Single Married Separated Divorced/widowed

n 5,897 6,751 3,069 575 1,096 1,321 1,305 1,234 4,048 5,193 4,946 2,509 4,873 5,505 211 2,059

% 46.6 53.4 24.3 4.5 8.7 10.4 10.3 9.8 32.0 41.1 39.1 19.8 38.5 43.5 1.7 16.3

NB The social groups in the UK are defined as follows: A is upper middle class (higher managerial, administrative, or professional), B is middle class (managerial, administrative, or professional), C1 is lower middle class and C2 skilled working class (junior managerial, skilled manual laborers), D is working class (semi- and unskilled workers), and E are those at the lowest level of subsistence (pensioners, widows)

are proportionally adjusted to sum to the total provided in the basic population table for the LSOA. For example, in the first LSOA, the total population is known to be 1,600. However, only 1,540 people answered the marital status question. To adjust for the known total population (1,600), the number of people in group AB (670) is divided by the total count in the social grade table (1,540) then multiplied by 1,600: 670=1,540

1,600 ¼ 696:1

This is repeated for all of the categories in social grade, in each LSOA. As discussed previously in the spatial microsimulation literature, with the deterministic reweighting algorithm applied here, the model will smooth all of the areas to look similar to each other. This will increase the error in prevalence estimates for areas which are unique, because the local population will look very different from the “general” population. One way to minimize the tendency to higher levels of error in dissimilar places is to first identify the LSOAs which have a similar population profile in terms of the constraint variables used to predict the health behavior. This may be done by running a k-means cluster analysis (Smith et al. 2009). In this example, the LSOAs are clustered based on the percent of the population in the 18–24 age range and social groups D or E and the percent who are unmarried. All of these groups are most likely to be smokers within the constraint data. There are a total of five clusters based on these population characteristics. The model is then run five times, once for each of the clusters. The results are

1780

A. J. Heppenstall and D. M. Smith

decimal values of people as smokers or nonsmokers, or children under the age of 18. The final counts of smokers are then used to calculate the prevalence of smoking among adults in each LSOA. The estimates are mapped at the LSOA level for London in Fig. 1. Estimated prevalence ranged from 14.8% to 41.8% by LSOA.

Fig. 1 Estimates of smoking prevalance of adults in LSOA’s in London

Spatial Microsimulation

1781

The results are validated against estimates created by the National Health Service (NHS: this is the main health-care provider in the UK) Information Centre (IC), based on HSE data from 2005 to 2008. The NHS estimates are available only at the primary care organization (PCO) level, of which there are 31 in London. We aggregated the estimates from the microsimulation model to the PCO level and measured the difference in prevalence between the NHS estimates and our estimates. The mean absolute difference was 2.85% overall, with a range of values from 6.86% to 9.34% (Table 4). The greatest error was in Barking and Dagenham PCO (9.34%). The error is reasonable given the available data and the time difference between our estimates and those created by the NHS IC. Table 4 Error between microsimulation model and estimates from NHS information centre (2005–2008 data)

Primary care organization Barking and Dagenham Barnet Bexley Brent Teaching Bromley Camden Croydon Ealing Enfield Greenwich teaching City and Hackney teaching Hammersmith and Fulham Haringey teaching Harrow Havering Hillingdon Hounslow Islington Kensington and Chelsea Kingston Lambeth Lewisham Newham Redbridge Richmond and Twickenham Southwark Sutton and Merton Tower Hamlets Waltham forest Wandsworth Westminster

Prevalence error (% simulation – % PCO model) 9.34 3.65 6.03 5.03 0.94 1.30 1.45 4.38 1.18 3.64 1.27 0.67 1.56 6.86 5.88 0.18 0.94 2.56 4.82 0.63 3.37 3.37 2.22 4.20 1.61 3.20 1.26 1.84 1.14 0.49 2.59

1782

A. J. Heppenstall and D. M. Smith

The estimate error may also be measured against real-world data, where available (see Smith et al. 2011), or against a similar outcome such as diabetic amputations when predicting prevalence of diabetes (Congdon 2006). Alternatively, the error may be tested against another known value that is related to both the constraint variables and the outcome. As this deterministic reweighting example shows, there are basic steps to the simulation process that must be conducted every time: • First, identify viable aspatial dataset for the health outcome/behavior (here, the HSE). • Second, conduct statistical analyses to identify the optimal predictor variables for the outcome that are available from a spatial dataset (here, the census). • Third, using the statistical information on predictors of the health outcome, cluster the areas based on the proportion of area populations that are in the highest subgroup of each constraint (i.e., Which social grade contains the greatest proportion of smokers? Marital status group?). • Fourth, ensure that the constraint tables are prepared with the same count of people in each area across all variables. • Finally, run the model and validate against available data, similar outcome, or a related variable present in both the spatial and aspatial data.

8

Conclusions

Comprehensive socioeconomic data is not, for reasons of confidentiality, available at the individual level within any locality. This lack of detailed data has motivated research into spatial microsimulation with the intention of creating realistic synthetic populations that are representative of the geographical areas to which they represent. These populations can then input into simulation models that allow policy makers to understand the small-area impact in policy or demographics. This chapter has provided a brief overview of spatial microsimulation, focusing on the main algorithms that are typically employed. A case study of prevalence of smoking in London was presented using deterministic reweighting to show the value that this approach can bring to informing policy on health outcomes. It is clear from this example that spatial microsimulation has a great deal to offer for social simulation modeling. However, there remain many possible directions for future research. In terms of improving the modeling technique, there is no overarching consensus to which algorithm (reweighting/synthetic reconstruction technique) is the most appropriate or accurate for different domain applications or at which spatial level. This requires further research on validation of the synthetic population estimates produced by researchers. It also requires researchers to be honest about areas in which spatial microsimulation is not successful!

Spatial Microsimulation

1783

There are several criticisms that can be leveled at microsimulation. They are data hungry, computationally intensive, only model one-way interactions (the impact of policy on individuals), and weak in handling behavioral modeling. Several of these limitations can be overcome by hybridization with other individual-based models, in particular, agent-based models (see Crooks and Heppenstall 2012, and Clarke in this handbook). This is perhaps one of the most exciting areas of future research for spatial microsimulation. Realistic individual-level populations can be generated that mimic specific characteristics about a geographical area, or a specific application; for example, populations can be generated that contain characteristics of particular interest to health or education researcher. These populations can be turned into individual agents that form part of a larger modeling effort (see Wu and Birkin 2012). Hybridization allows the incorporation of both different types of behavior and detailed interactions between individuals, something which microsimulation alone is not capable of. The value of hybridization with agent-based models is that it would allow both researchers and policy makers to ask more sophisticated questions of social simulation models and in turn receive more accurate and realistic forecasts.

9

Cross-References

▶ Bayesian Spatial Analysis Acknowledgments This work was funded by the ESRC funded grant “Modeling Individual Consumer Behavior” (RES-061-25-0030) and MRC Population Health Scientist Fellowship (G0802447). The modeling framework used was developed by Kirk Harland.

References Ballas D, Rossiter D, Thomas B, Clarke G, Dorling D (2005) Geography matters. Simulating the local impacts of national social policies. Joseph Rowntree Foundation, York Beckman RJ, Baggerly KA, McKay MD (1996) Creating synthetic baseline populations. Transp Res Part A 30(6):415–429 Birkin M, Clarke M (1988) Synthesis – a synthetic spatial information system for urban and regional analysis: methods and examples. Environ Plann A 20(12):1645–1671 Birkin M, Wu BM (2012) A review of microsimulation and hybrid agent-based models. In: Heppenstall AJ, Crooks AT, See LM, Batty M (eds) Agent-based models of geographical systems. Springer, Dordrecht, pp 51–68 Brown L, Harding A (2002) Social modeling and public policy: application of microsimulation modeling in Australia. J Artif Soc Soc Simul 5:4 Congdon P (2006) Estimating diabetes prevalence by small area in England. J Public Health 28(1):71–81 Crooks A, Heppenstall A (2012) Introduction to agent-based modeling. In: Heppenstall AJ, Crooks AT, See LM, Batty M (eds) Agent-based models of geographical systems. Springer, Dordrecht, pp 85–108

1784

A. J. Heppenstall and D. M. Smith

Davies L (1987) Genetic algorithms and simulated annealing: research notes in artificial intelligence. Pitman, London Gilbert N, Troitzsch KG (2005) Simulation for the social scientist. Open University Press, Berkshire Harland K, Heppenstall AJ, Smith DM, Birkin MH (2012) Creating realistic synthetic populations at varying spatial scales: a comparative critique of population synthesis techniques. J Artif Soc Soc Simul 15:1 Moon G, Quarendon G, Barnard S, Twigg L, Blyth B (2007) Fat nation: deciphering the distinctive geographies of obesity in England. Soc Sci Med 65(1):25–31 O’Donoghue C (2001) Dynamic microsimulation: a methodological survey. Braz Electron J Econ 4:2 Openshaw S (1995) Developing automated and smart spatial pattern exploration tools for geographical information systems applications. J Roy Soc, D 44(1):3–16 Otten RHJM, van Ginneken LPPP (1989) The annealing algorithm. Kluwer, Boston Smith DM, Clarke GP, Harland K (2009) Improving the synthetic data generation process in spatial microsimulation models. Environ Plann A 41(5):1251–1268 Smith DM, Pearce JR, Harland K (2011) Can a deterministic spatial microsimulation model provide reliable small-area estimates of health behaviors? An example of smoking prevalence in New Zealand. Health Place 17:618–624 Voas D, Williamson P (2001) Evaluating goodness-of-fit measures for synthetic microdata. Geogr Environ Model 5:177–200 Williamson P, Clarke GP (1996) Estimating small-area demands for water with the use of microsimulation. In: Clarke GP (ed) Microsimulation for urban and regional policy analysis. Pion, London, pp 117–148 Wu BM, Birkin MH (2012) Agent-based extensions to a spatial microsimulation model of demographic change. In: Heppenstall AJ, Crooks AT, See LM, Batty M (eds) Agent-based models of geographical systems. Springer, Dordrecht, pp 347–360

Fuzzy Modeling in Spatial Analysis Yee Leung

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Fundamental Concepts and Operations of Fuzzy Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Basic Definitions and Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Fuzzy Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Linguistic Variable and Possibility Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Spatial Data Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Problems of the Two-Valued Logic Approach to Regionalization . . . . . . . . . . . . . . . . . . 3.2 Representation of a Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 The Core and Boundary of a Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Regional Assignment Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Grouping via Fuzzy Relation and Fuzzy Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Spatial Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Optimization of a Precise Objective Function over a Fuzzy Constraint Set . . . . . . . . 4.2 Optimization of a Fuzzy Objective Function over a Fuzzy Constraint Set . . . . . . . . . 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1786 1787 1787 1793 1795 1796 1796 1797 1798 1800 1800 1805 1805 1806 1808 1809 1809

Abstract

Spatial systems are generally complex and embedded with uncertainty due to subjectivity and vagueness of human valuation and decision. Such uncertainty cannot be equated with randomness, but fuzziness of a system. Fuzzy sets, an extension of the classical notion of sets, are sets whose elements have degrees of membership/degrees of belonging to a set. Fuzzy mathematics based upon the notion of fuzzy sets provides attractive methods to model the above reality better Y. Leung (*) Institute of Future Cities, Department of Geography and Resource Management, The Chinese University of Hong Kong, Hong Kong, People’s Republic of China e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_114

1785

1786

Y. Leung

than our traditional tools for formal modeling, reasoning, and computing that are crisp (i.e. dichotomous), deterministic, and precise in character. In this chapter we first describe the basic mathematical framework of fuzzy set theory in which imprecision in the sense of vagueness can be precisely and rigorously studied. Then attention is moved to applications of the theory in the field of spatial analysis with special reference to classification and grouping, as well as optimization in a fuzzy environment. Keywords

Fuzzy modeling · Fuzzy set · Region · Regionalization · Spatial data analysis · Spatial decision · Fuzzy clustering · Fuzzy spatial optimization

1

Introduction

Spatial problems at all scales have built-in complexities and imprecision/fuzziness. Compounded by multiple objectives and multiple decision makers, fuzziness and complexity are intrinsic in spatial systems. Our ability in handling complexity and fuzziness is thus crucial. Human beings can take in information in a granular and imprecise form and make rational decisions accordingly. The analysis of such behaviors calls for a formal system that can deal with the fuzziness of our language and valuation process. Fuzzy set theory (Zadeh 1965; Zimmermann 2010) is such a formalism that is instrumental to spatial analysis under uncertainty. The theory is important when human perception and valuation are involved and qualitative information requires a formal foundation for analysis. Even more importantly, fuzzy set theory is a powerful approach to bridge qualitative and quantitative analyses. Over the years, the fuzzy set approach has been developed and applied to spatial analysis in general (see for example Gale 1972 Leung 1988, Altman 1994 and Robinson 2003) and spatial economic analysis (see for example Ponsard 1988) in particular. In terms of data analysis, fuzzy clustering has made significant contributions to cluster analysis (Bezdek et al. 1984), and fuzzy optimization has extended classical optimization to the fuzzy environment (Bellman and Zadeh 1970; Verdegay 2015). Under uncertainty, deterministic mathematical models of spatial systems tend to be over-simplistic, inflexible and too mechanical. Empirical evidence often shows that too much of the human imprecision and spatial system complexities has been grossly simplified in the deterministic models. To relax the assumptions of deterministic models, the notion of uncertainty due to randomness is often injected into the model building process. However, there is another aspect of uncertainty in human systems which cannot be attributed to randomness. Often, uncertainty is due to imprecise human thought, language and perception in complex systems with incomplete information. This type of uncertainty has nothing to do with randomness but fussiness of our languages and system structures. To solve fuzzy problems, our traditional tools for formal

Fuzzy Modeling in Spatial Analysis

1787

modeling, reasoning and computing are not appropriate because they are crisp (i.e., dichotomous) and deterministic in character, and are thus inadequate to capture fuzziness and gradation in the real world. Fuzzy set theory is a mathematical theory for imprecision/fuzziness. Such uncertainty is due to fuzziness of the connotations of the terms used and reasoning involved. The theory certainly renders a more appropriate modeling of uncertainty in fuzzy spatial systems. It can bridge the linguistic and the mathematical approaches also. This chapter gives a review of fuzzy modeling in spatial analysis. It covers the fundamental concepts of human spatial perception and valuation, basic fuzzy set concepts, the concept of a region, spatial clustering and spatial optimization within a fuzzy environment. To facilitate our discussion, we start with some basic definitions and operations in fuzzy set theory in Sect. 2. We then move to fuzzy clustering in Sect. 3 and fuzzy optimization in Sect. 4. To conclude the chapter, a summary and outlook of fuzzy set theory in spatial analysis is provided.

2

Fundamental Concepts and Operations of Fuzzy Set Theory

2.1

Basic Definitions and Operations

To distinguish the notation of ordinary set and fuzzy set, the lower bar under capital letters is used to denote ordinary sets henceforth. Definition 1: Let X be a universe of discourse and x  X. Then a fuzzy set A in X is a set of ordered pairs A ¼ f½x, μA ðxÞg, 8x  X where μA: X ! M is called the membership function which maps x  X into μA(x) of a totally ordered set M, called the membership set, and μA(x) indicates the grade of membership (degree of belonging) of x in A. The membership set M can take on a general structure such as a lattice, in particular, it is the closed interval [0,1]. For example, to define a low-income country, the term “low-income” is a valuative term defined on a measurable base variable, e.g. gross national income (GNI). To characterize the term “low”, an arbitrary value, say $1,035 GNI, may be employed as the cut-off point so that any nation with a GNI lower than or equal to $1,035 is considered as a low-income country, otherwise it is not considered as a low-income country. By two-valued logic, “low-income” is a set defined by a characteristic function f : X ! f0, 1g, x ! f lowincome ðxÞ, x  X such that

ð1Þ

1788

Y. Leung

 f lowincome ðxÞ ¼

1 if x  $1035 GNI 0 otherwise:

ð2Þ

Employing such an artificial characterization, there is no imprecision associated with the term “low-income”. However, “low-income” is in fact a term with fuzzy connotation. It is not crisp and lacks a clear-cut boundary. Actually, the fuzzy connotation of the term “low-income” can be conceptualized as a set with fuzzy boundary. The transition from membership to non-membership is gradual rather than abrupt. To be natural, its characteristic function should exhibit such a gradation as depicted in Fig. 1. The basic idea is that the degree of belongingness to the term “low-income” should only change slightly if the GNI increases or decreases slightly. That is, a slight change in GNI should not lead to an abrupt change in class, such as “low-income”. The physical meaning is that a gradual instead of an abrupt transition from membership to non-membership is a norm when imprecise concepts are encountered. When the membership set is restricted to the set {0,1}, μA becomes the characteristic function f A in Eqs. (1) and (2), and A becomes an ordinary set. Therefore, an ordinary set is only a special case of a fuzzy set. Such a conceptualization is a more general and natural reflection of imprecision intrinsically embedded in terms with fuzzy connotations. This is exactly what fuzzy set theory targets at. Definition 2: A fuzzy set is empty, denoted as Ø, if μA(x) ¼ 0, 8x  X. Definition 3: A fuzzy set A is included in a fuzzy set B, denoted as A  B, if and only if μA(x)  μB(x), 8x  X. Definition 4: Fuzzy sets A and B are equal, denoted as A ¼ B, if and only if μA(x) ¼ μB(x), 8x  X. Definition 5: The support of a fuzzy set is the ordinary subset of Xdefined by suppðAÞ ¼ fxj μA ðxÞ > 0g:

Fig. 1 Membership function of “low-income” under the concept of a fuzzy set

f

low-income

1

0

2

GNI $

x

Fuzzy Modeling in Spatial Analysis

1789

Definition 6: The height of a fuzzy set A is the least upper bound of A defined by hgt ðAÞ ¼ sup μA ðxÞ: x

Definition 7: A fuzzy set A is said to be normalized if there exists a x  X such that μA(x) ¼ 1. Parallel to the theory of ordinary sets, basic operations such as union, [, intersection, \, and complementation, ⅂, on fuzzy sets can also be defined. Let F ðXÞ ¼ fAjμA: X ! ½0, 1g be the fuzzy power set of X whose elements are fuzzy sets. The operations [, \, and ⅂ on X are defined in Definitions 8, 9 and 10 respectively. Definition 8: Let A, B  F ðXÞ. The union of A and B, denoted as A [ B, is defined by μA[B ðxÞ ¼ max½μA ðxÞ, μB ðxÞ,

8x  X,

or in the logical symbol “or”, _, μA[B ðxÞ ¼ μA ðxÞ _ μB ðxÞ, 8x  X: Definition 9: Let A, B  F ðXÞ. The intersection of A and B, denoted as A \ B, is defined by μA\B ðxÞ ¼ min½μA ðxÞ, μB ðxÞ,

8x  X,

or in the logical symbol “and”, ^, μA\B ðxÞ ¼ μA ðxÞ ^ μB ðxÞ, 8x  X: Definition 10: Let A  F ðXÞ. The complement of A, denoted as ⅂A, is defined by

When the membership set M ¼ {0, 1}, Definitions 8, 9 and 10 become the union, intersection and complementation of ordinary sets. Definition 11: Let A, B  F ðXÞ. The difference of A and B, denoted as A  B, is defined by

1790

Y. Leung

1

1

1

2 1

1

0 1

Fig. 2 An illustration of α-level sets of a fuzzy set A (Source: Leung (1988), with permission from Elsevier)

Definition 12: Let A  F ðXÞ. Let x1 , x2  X. Then A is a convex fuzzy set if μA ðλx1 þ ð1  λÞx2 Þ  min½μA ðx1 Þ, μA ðx2 Þ, 8λ  ½0, 1: It should be noted that “max” and “min” are not the only operators that we can use to define the union and intersection of fuzzy sets, and the complement of A can also be defined differently. Zimmermann (2010) gives a review of more operations used in the literature. Though the union, intersection and complementation are not uniquely defined in fuzzy set theory, it actually gives more flexibility in catering for different situations in spatial analysis. Since a fuzzy set A does not have a precise boundary, then the exact identification of its elements is in general problematic. Except for those elements whose grades of membership are unity or zero, all other elements belong to a fuzzy set to a certain degree. In case exact identification is necessary, then we need to identify the ordinary set corresponding to a fuzzy set by imposing a rule for determining unambiguous membership. One way is to require the elements of the corresponding ordinary set to possess a certain grade of membership in the fuzzy set. The concept of α-level sets is thus formulated. It actually serves as an important linkage between ordinary sets and fuzzy sets. It also plays an important role in the construction of fuzzy sets by ordinary sets.

Fuzzy Modeling in Spatial Analysis

1791

Definition 13: Let A  F ðXÞ. The α-level set of A is the ordinary set Aα such that Aα ¼ fxjμA ðxÞ  αg with α  ½0, 1, and is defined by the characteristic function  f Aα ðxÞ ¼

1 if μA ðxÞ  α 0 if μA ðxÞ < α:

Example 1: Let X ¼ fx1 , x2 , x3 , x4 , x5 g. Let A ¼ (0.8, 0.5, 0, 0.1, 1). Then A1 ¼ fx5 g, A0:8 ¼ fx1 , x5 g, A0:5 ¼ fx1 , x2 , x5 g, A0:1 ¼ fx1 , x2 , x4 , x5 g, A0 ¼ X. Therefore, a α-level set is in fact an ordinary set whose elements belong to the corresponding fuzzy set at and above a certain degree (see Fig. 2 for an illustration). Employing the concept of α-level set, an important connection between ordinary sets and fuzzy sets can be established by the following theorem:

Theorem 1 (Decomposition Theorem): Let A  F ðXÞ. Then



[ α Aα:

α  ½0, 1

Theorem 1 states that fuzzy set A can be decomposed into a series of α-level sets Aα by which A can be reconstructed. Therefore, any fuzzy set can be treated as a family of ordinary sets. Definition 14: (Extension Principle of Fuzzy Sets): Let f : X ! Y. Let F ðXÞ and F ðY Þ be fuzzy power sets in X and Y respectively. Then f induces two functions f and f1 such that f induces a fuzzy set B in Y from a fuzzy set A in X and f 1 induces a fuzzy set A in X from a fuzzy set B in Y as follows: f : F ðXÞ ! F ðY Þ A ! f ð AÞ f 1 : F ðY Þ ! F ðXÞ B ! f 1 ðBÞ: Let μA, μB, μf(A), and μ f 1 ðBÞ be the corresponding membership functions of A, B, f(A), and f 1(B). Then

1792

Y. Leung

μ B ðyÞ ¼ μ μ A ðxÞ ¼ μ

f ð A Þ ðyÞ

f 1 ðBÞ ðxÞ

¼

8 < sup μA ðxÞ :

y¼f ðxÞ

0 if f 1 ðyÞ ¼ Ø:

¼ μB ð f ðxÞÞ, 8x  f 1 ðY Þ:

The extension principle is important in extending mathematical concepts and operations of a non-fuzzy system, for example, the extension of algebraic operations in ℝ to the operations of fuzzy numbers (fuzzy sets) in ℝ (see Fig. 3 for an illustration). Example 2: Let X ¼ fx1 , x2 , x3 , x4 , x5 , x6 g, Y ¼ fy1 , y2 , y3 , y4 g, and f : X ! Y be defined by 8 > < y1 if x ¼ x1 , x3 , x4 f ðxÞ ¼ y2 if x ¼ x2 > : y4 if x ¼ x5 , x6 and A  F ðXÞ be defined by A ¼ ð0, 0:5, 0:2, 1, 1, 0:6Þ: Then by the extension principle, B ¼ [max(0, 0.2, 1), 0.5, 0, max(1, 0.6)] ¼ (1, 0.5, 0, 1).

y 0

(

1)

=

1

(

2)

=

2

f B 2

A 1

0

(

2)

(

1)

x

Fig. 3 A simple representation of the extension principle (Source: Leung (1988), with permission from Elsevier)

Fuzzy Modeling in Spatial Analysis

2.2

1793

Fuzzy Relations

The concept of a fuzzy relation is constructed for the purpose of studying imprecise relations. In comparing distance, we may say that “distance x is much longer than distance y”. In describing a region, we may say that “region S is poor and densely populated”. In grouping spatial units, we may say that “place A is similar to place B”. All these relations are imprecise because the words in italics have fuzzy connotations. To have a formal analysis of such imprecise relations, formal representation of a fuzzy relation is required. Definition 15: Let X and Y be two universes of discourse. A fuzzy binary relation R in the Cartesian product X  Y is a fuzzy set in X  Y defined by the membership function μR : X  Y ! ½0, 1 ðx, yÞ ! μR ðx, yÞ, x  X, y  Y where the grade of membership μR(x, y) indicates the degree of relationship between x and y. Example 3: Let X ¼ Y ¼ ℝ. Then R ≜ “x and y are very similar” is a fuzzy binary relation which may be defined by 2

μR ðx, yÞ ¼ ekðxyÞ , k > 1: As can be observed in Fig. 4, the more similar x and y are (i.e., the smaller is their difference), the higher is their degree of similarity. Since fuzzy relations are actually fuzzy sets, then the concepts and operations of fuzzy sets can be parallelly applied to fuzzy relations. For example, the operations [, \ ,and ⅂ can be respectively defined as follows: Let F ðX  Y Þ be the fuzzy power set and R, S  F ðX  Y Þ. Then the union of R and S, denoted as R [ S, is defined by μR[S ðx, yÞ ¼ max½μR ðx, yÞ, μS ðx, yÞ, 8ðx, yÞ  X  Y: The intersection, denoted as R \ S, is defined by μR\S ðx, yÞ ¼ min½μR ðx, yÞ, μS ðx, yÞ, 8ðx, yÞ  X  Y:

1794

Y. Leung

1

x-y Fig. 4 Membership function of the fuzzy relation R ≜ “x and y are very similar” (Source: Leung (1988), with permission from Elsevier)

The complement, denoted as ⅂R, is defined by

Definition 16: Let R  F ðX  Y Þ: Then the projection of Y on X through R, denoted as RðXÞ, is defined by μRðXÞ ðxÞ ¼ sup μR ðx, yÞ y

the projection of X on Y, denoted as RðY Þ, is defined by μRðY Þ ðyÞ ¼ sup μR ðx, yÞ x

and the global projection of R, denoted as g(R), is defined by gðRÞ ¼ sup sup μR ðx, yÞ: y

x

That is, we can obtain the involving part of X and Y with respect to R(x, y) by projecting R(x, y) onto X and Y, respectively. Definition 17: Let R  F ðX  Y Þ, S  F ðY  ZÞ: The max-min composition of R and S, denoted as R o S, is defined by μRoS ðx, zÞ ¼ max min½μR ðx, yÞ, μS ðy, zÞ, x  X, y  Y, z  Z: y

Fuzzy Modeling in Spatial Analysis

1795

The compositions of fuzzy relations allow us to relay relationship in R to S and form a new fuzzy relation. Such a process is important in modeling relations and inferences in a variety of spatial problems.

Example 4: (Leung 1988): Let X ¼ fx1 , x2 g be two higher order objectives in a project evaluation process. Let Y ¼ f y1 , y2 , y3 g be three lower order objectives. Let Z ¼ f z1 , z2 , z3 g be three projects to be evaluated on their abilities in achieving the higher order objectives.

Let R ¼

y " 1 0:7 x1

y2 0:5

y3 # 0:1

x2 0:9

0:4

0:2

y1

2

z1

z2

z3

0:8

0:4

0:1

6 6 S ¼ y2 6 6 0:4 4 y3 0:1

0:9 0:2

3

7 7 0:3 7 7 5 0:9

be the respective achievement matrices with, for example, μR(x1, y3) ¼ 0.1 indicates the degree of achieving the higher order objective x1 by the lower objective y3. Then

RoS¼

z1 0:7

z2 0:5

z3 # 0:3

x2 0:8

0:4

0:3

x1

"

is the new achievement matrix with μR o S (x1, z1) ¼ 0.7, for example, indicating the degree of achieving objective x1 by project z1. To make the composition in Definition 17 more general, “min” can be replaced by an operator  as long as  is associative and monotonically nondecreasing in each argument. Furthermore, composition such as “min-max” can also be defined if situations require.

2.3

Linguistic Variable and Possibility Distribution

In conventional mathematical systems, variables are assumed to be precise. That is, a variable can be precisely measured and has exact values. However, our daily languages are generally imprecise, particularly at the semantic level. Therefore, a word in our daily languages can technically be regarded as a fuzzy set. Its imprecision is mostly possibilistic rather than probabilistic in nature. Since languages are usually used in spatial system analysis, formal representation of a linguistic variable is then essential. In general, a variable in a universe of discourse can be called a linguistic variable if its values are words of sentences in a daily or artificial language (see Zadeh 1975a,

1796

Y. Leung

b, c and 1978 for details). For example, if distance is measured in terms of words such as long and short rather than physical units such as kilometers or hours, then it can be characterized as a linguistic variable whose spatial connotation is imprecise (Leung 1982). Definition 18: A linguistic variable is designated by a quintuple ðX, T ðXÞ, X , G, MÞ . The symbol X is the name of the variable, for example “distance”. T(X) is the term set of X, finite or infinite, which consists of terms F, linguistic values, such as long, short, very long, rather short, not long, and long or short in a universe of discourse X . In terms of physical distance, elements in X may be unit distance in kilometers or hours. The terms in the term set are generated by a syntactic rule G. The semantic rule M associates each term its meaning, a fuzzy set in X. Through M, a compatibility function μ : X ! ½0, 1 is constructed to associate each element in X its compatibility with a linguistic value F. Therefore, a linguistic variable is a fuzzy variable whose values are fuzzy sets in a universe of discourse. The base variable of the linguistic variable is a precise variable which takes on individual values in its domain, i.e. the universe of discourse X. The domain of the linguistic variable is, however, the collection of all possible linguistic values, fuzzy sets, defined in the same universe of discourse through the base variable.

3

Spatial Data Clustering

Clustering of spatial data into groups is perhaps one of the most common spatial data analysis tasks in geographical research. The identification of regions and the grouping of spatial units according to their attributes into regions are typical and fundamental problems. In here, we will first discuss the problem of using the conventional binary logic for spatial clustering, then we will introduce the fuzzy set method and its advantages for accomplishing such tasks.

3.1

Problems of the Two-Valued Logic Approach to Regionalization

Region is the basic unit of regional analysis, and regionalization (spatial clustering) is a fundamental issue in spatial analysis in general and regional economics in particular. They are crucial in organizing spatial information, characterizing and identifying spatial structures and processes, and constructing a suitable basis for the formulation of spatial theories and planning policies. (See for example the reviews by Berry 1967 and Fischer 1984). Conventionally, a region is treated as a theoretical construct which can be precisely characterized and delimited. Boundaries separating regions and nonregions are precise. The assignment of spatial units to regions is again precise. Therefore,

Fuzzy Modeling in Spatial Analysis

1797

space can be subdivided into mutually exclusive and exhaustive regions. Thus, regional characterization, regional assignment and regionalization are all based on a two-valued logic which neglects the intrinsic fuzziness in our spatial classification and grouping problems. In reality, such a rigid system is unnatural and is generally inadequate in handling fuzziness in most regionalization problems because of the gradual transition of phenomena in space and the imprecision of the terms we use to describe a region. To accommodate quantitative and qualitative information with varying degrees of imprecision, a finer system of representation and a more flexible regionalization procedure are thus necessary. The fuzzy set approach provides such means to formally analyze regional concepts. In what follows, we discuss how a region can be characterized and how regionalization/clustering can be done under fuzziness.

3.2

Representation of a Region

n o Let Y ¼ Y k , k ¼ 1, . . . , ‘, be a set of ‘ differentiating characteristics by which regions are characterized. Let Z be a region which is identified as F. Then region Z can be characterized by the joint linguistic proposition P: P : Y ðZÞ is F

ð3Þ

where Y ¼ (Y1, . . . , Yk, . . . , Y‘) is a ‘-ary linguistic variable; F ¼ F1   Fk   F‘ is a ‘-ary relation in Y 1   Y k   Y ‘ and “Y(Z ) is F ” reads Y of region Z is F. The linguistic proposition in Eq. (3) induces a joint possibility distribution (Zadeh 1975a, b, c; Dubois and Prade 2002) ΠY ðZÞ ¼ F

ð4Þ

such that   Poss ðY ðZ Þ ¼ yÞ ¼ ΠY ðyÞ ¼ μF ðyÞ ¼ min μF1 ðy1 Þ, . . . , μFk ðyk Þ, . . . , μF‘ ðy‘ Þ ð5Þ where y ¼ ðy1 , . . . , yk , . . . , y‘ Þ ϵ Y with yk ϵ Y k , k ¼ 1, . . . , ‘; ΠY is the corresponding joint possibility distribution function of ΠY ; and μF is the membership function of the fuzzy ‘-ary relation F with Fk being a fuzzy set in Y k , k ¼ 1, . . . , ‘. Example 5: Let Y ¼ income (in montary unit) be the only characteristic by which economic regions are identified. Let Z be a region which is characterized as a “moderate income region”. With respect to the measurable base variable income (in monetary unit), the connotation of moderate is vague. It imposes a fuzzy restriction on the base variable income. Formally, Z can be characterized by the linguistic proposition

1798

Y. Leung

P : IncomeðZ Þ is moderate,

ð6Þ

which induces a possibility distribution ΠіnсоmеðZÞ ¼ moderate,

ð7Þ

such that PossðIncomeðZÞ ¼ aÞ ¼ Πіnсоmе ðaÞ ¼ μmoderate ðaÞ:

ð8Þ

Here, the linguistic variable in Y is Y ≜ “income”. Since a region is a fuzzy concept which is linguistically characterized and eventually materialized as a fuzzy set, it is essential to define the core and boundary of a region.

3.3

The Core and Boundary of a Region

Except for special cases, cores and boundaries can be considered as intrinsically fuzzy. Based on the characterizations in Eqs. (6), (7) and (8), the core of the region Z may be defined as the point or area in space having characteristics y ϵ Y (Leung 1988). Definition 19: The core of a region Z, denoted as CORE(Z ), is defined by  CORE ðZÞ ¼

y j μF ðy Þ ¼ hgt ðFÞ ¼ supμF ðyÞ y

  ¼ min sup μF1 ðy1 Þ, . . . , sup μFk ðyk Þ, . . . , sup μF‘ ðy‘ Þ, . . . : y1

yk

y‘

ð9Þ

Since the linguistic characterization has imprecise spatial connotations, the corresponding boundary is naturally fuzzy. Based on Eqs. (6), (7) and (8), the boundary of the region Z can be defined as follows. Definition 20: The boundary of a region Z, denoted as BOUNDARY(Z), is defined by BOUNDARY ðZ Þ ¼ fyj0 < μF ðyÞ < 1⟺0   < min μF1 ðy1 Þ, . . . , μFk ðyk Þ, . . . , μF‘ ðy‘ Þ < 1 :

ð10Þ

The above fuzzy conceptualization of a regional boundary has led to a series of studies on intermediate boundaries in regional analysis (Burrough and Frank 1996).

Fuzzy Modeling in Spatial Analysis

1799

However, in practice, a precise boundary may be required and we can always delimit it by restricting μF( y) in Definition 20 to a specific value α  (0, 1). Definition 21: The α-boundary of a region Z, denoted as α-BOUNDARY(Z ), is defined by α  BOUNDARYðZÞ ¼ fyjμF ðyÞ ¼ αg 0 < α < 1:

ð11Þ

Therefore, the α-boundary of a region is a line connecting all points in space whose compatibility to the characterization of the region is α. Having imposed a precise boundary, the corresponding region can in turn be precisely delimited as the α-level set of F: Z ¼ fyjμF ðyÞ  αg 0 < α < 1

ð12Þ

with the characteristic function  f F ðyÞ ¼

1 0

if μF ðyÞ  α otherwise:

ð13Þ

That is, the α-boundary delimits a region whose elements are at least α-compatible to the characteristics of the region. The value serves as a qualifying threshold for the region. It can be viewed as a defuzzification mechanism if a precise boundary is desired. If two regions are fuzzily defined, then how can they be separated? The answer to such a question relies on the application of the following separation theorem in fuzzy set theory (Zadeh 1965).

Theorem 2 Let F and G be bounded and convex in En with maximal grades

MF and MG such that MF ¼ supμF ðyÞ y

MG ¼ supμG ðyÞ: y

Let M be the maximal grade of F \ G such that M ¼ sup min½μF ðyÞ, μG ðyÞ: y

Then, the highest degree of separation of F and G is S ¼ 1  M:

1800

Y. Leung

Applying the above theorem, we can then define and demarcate the boundary between two regions (Leung 1988). Definition 22: The boundary between regions Z and W, denoted as BOUNDARY (Z, W), is defined by BOUNDARYðZ, W Þ ¼ fyj0 < min ½μF ðyÞ, μG ðyÞ  Mg:

ð14Þ

Depending on F \ G, we have the following three cases: Case 1: Case 2: Case 3:

3.4

F \ G ¼ ∅ ) M ¼ 0, S ¼ 1 ) Z and W are completely different and can be unambiguously separated. F ¼ G ) F \ G ¼ F(¼G) ) M ¼ 1, S ¼ 0 ) BOUNDARY (Z, W) ¼ {y| 0 < μF(or G)( y) < 1}, (the single-region case). F \ G 6¼ ∅ , F ⊈ G and G ⊈ F ) M  (0, 1), S  (0, 1)) BOUNDARY (Z, W) ¼ {y| 0 < min [μF( y), μG( y)]  M}, (Definition 22)

Regional Assignment Procedure

Let P and Q be the characterizations of region Z and spatial unit X, respectively. Then, the possibility of X being assigned to Z is obtained through the following assignment rule: h i PossðQjPÞ ¼ PossðY ðXÞ is HjY ðZÞ is FÞ ¼ sup min μH ðyÞμF ðyÞ y    ¼ min sup min μH1 ðy1 Þ, μF1 ðy1 Þ , . . . , y1

   sup min μHk ðyk Þ, μFk ðyk Þ , . . . , sup min μHℒ ðyℒ Þ, μFℒ ðyℒ Þ yk

yℒ

ð15Þ  

:

Thus, the possibility of assigning spatial unit X to region Z depends on how well the pattern H characterizing X matches F characterizing Z.

3.5

Grouping via Fuzzy Relation and Fuzzy Clustering

Conventionally, spatial units are grouped together by either their similarities or differences/distances in attributes. This problem can in general be treated as a clustering problem, in which spatial units in the same region are as similar as possible, and spatial units belonging to other regions are as dissimilar as possible. Within the framework of fuzzy set, a similarity matrix, a fuzzy relation, can be constructed to represent the degree to which two units are similar or different. We first discuss the fuzzy graph approach to clustering/regionalization (Leung 1988). Then we discuss the fuzzy c-mean approach to spatial clustering (Dunn 1973; Bezdek et al. 1984).

Fuzzy Modeling in Spatial Analysis

1801

3.5.1 Grouping Based on the Concept of Similarity Let R be the similarity matrix. Without loss of generality, let R be a fuzzy binary relation with μR(xi, xj)  [0, 1] indicating the degree to which spatial units xi and xj are similar. The higher its value is, the higher the similarity of xi and xj becomes. The relation R is obviously reflexive and symmetric and is thus a resemblance relation. However, regionalization cannot be meaningfully performed unless R is transitive also. That is, R should be a similitude relation with the following properties: μR ðxi , xi Þ ¼ 1, 8ðxi , xi Þ  X  X,





μR xi , x j ¼ μR x j , xi , 8 xi , x j  X  X,

ð16Þ ð17Þ









μR ðxi , xk Þ  max min μR xi , x j , μR x j , xk , 8 xi , x j , x j , xk , ðxi , xk Þ  X  X: ð18Þ xj

Reflexivity and symmetry of R are respectively stated in Eqs. (16) and (17), while max-min transitivity is imposed by Eq. (18). The physical meaning of Eq. (18) is that the similarity of spatial units xi and xk should be at least as high as the lowest of the similarities of xi and xj and of xj and xk, for all intermediate xj’s. Compared to the notion of transitivity in conventional analysis, the present one defines a weak transitivity of similarity. Since not all similarity matrices are max-min transitive, we need a procedure to transform an intransitive similarity matrix R into a transitive one. A natural method is b of R. to derive the transitive closure, R, Definition 23: The max-min transitive closure of a fuzzy binary relation R is Rb ¼ R [ R2 [ R3 [ . . .

ð19Þ

where R2 ¼ R ∘ R is the max-min composition of R. The element μR(xi, xj) indicates the max-min transitive similarity of xi and xj. The following theorem shows how such a transitive closure can be obtained. Theorem 3 Let R be any fuzzy binary relation. If for some k, the max-min

composition Rk + R [ R2 [ . . . [ R k .

1

¼ Rk, then the max-min transitive closure is Rb ¼

Definition 24: The strongest path from spatial unit xi to xj is defined by



‘ xi , x j ¼ max ‘ xq1 ¼ xi , . . . , xqr ¼ x j C ð xi , x j Þ

ð20Þ

1802

Y. Leung

where C(xi, x j) is the ordinary set of all paths from xi to xj with the value of each path xq1 , . . . , xqr defined by





 ‘ xq1 , . . . , xqr ¼ min μR xq1 , xq2 , . . . , μR xqr1 , xqr : Thus, the strongest path from xi to xj is the strongest of the weakest links between xi and xj. Since





μb xi , x j ¼ ‘ xi , x j , 8 xi , x j  X  X, R then the max-min transitive similarity of xi and xj is actually the strongest path from xi to xj. Thus, the similarity of xi and xj can be interpreted as the strongest of the weakest similarities of xi and xj in the path. Based on Theorem 1, the similitude relation Rb (or R, if R is transitive) can be decomposed into Rb ¼

bα [ αR

α  ½0, 1

ð21Þ

where

f b xi , x j ¼ R

(



1 if μb xi , x j  α R

0 if μb xi , x j < α

ð22Þ

R

and bα R bα : α1 > α2 ) R 1 2

ð23Þ

bα is reflexive, symmetric and transitive in the sense of ordinary sets, it is an Since R equivalence class (equivalence relation) of level α. Within each α-level equivalence class, the of any similarity

two spatial units is no less than α. For example, if α ¼ 0.6, then μR xi , x j > 0:6, 8 xi , x j  R0:6 . Therefore, it is natural to equate each equivalence class with a region, bα ≜α-region. All spatial units within a α-region is α-similar with degree of i.e. R similarity greater than or equal to α. For a specific value of α, we have a specific grouping of spatial units into regions. By varying the value of α, we could observe the variations of grouping for regions. Such a hierarchical regionalization procedure is natural and more informative. It can model the gradual and abrupt transitions from membership to non-membership of spatial units among regions. If we define distance (dissimilarity) as the negation of similarity, then parallel to the concept of similarity, transitivity in distance is also essential for any grouping procedure. The basic fuzzy set concept which models closely such an idea is a dissimilitude relation R which satisfies the following properties:

Fuzzy Modeling in Spatial Analysis

1803





μR xi , x j ¼ 0, 8 xi , x j  X  X,





μR xi , x j ¼ μR x j , xi , 8 xi , x j  X  X, 







μR ðxi , xk Þ  min max μR xi , x j , μR x j , xk , 8 xi , x j , x j , xk , ðxi , xk Þ  X  X: xj

ð24Þ ð25Þ ð26Þ

Thus, a dissimilitude relation is anti-reflexive, (see Eq. (24)) and symmetric (see Eq. (25)). Moreover, it is min-max transitive. The meaning of Eq. (26) is that the distance (dissimilarity) between spatial units xi and xk should be at most as long as the longest of the distances between xi and xj or that between xj and xk, for all intermediate xj’s. If a fuzzy binary relation only satisfies anti-reflexivity and symmetry, then it is a dissemblance relation. Matrices whose elements are generalized relative Hamming distances or dissemblance relations. Since not all dissemblance relations are transitive, we need a procedure to obtain transitive closures from intransitive relations. Definition 25: The min-max transitive closure of a fuzzy binary relation R is R ¼ R \ R2 \ R3 \ . . .

ð27Þ

where R2 ¼ R  R is the min-max composition of R. The element μR(xi, xj) denotes the min-max transitive distance between xi and xj. The higher its value is, the higher the dissimilarity of xi and xj becomes. The following theorem can be employed to obtain R from R. Theorem 4: Let R be any fuzzy binary relation. If for some k, the min-max

composition Rk + 1 ¼ Rk, then the min-max transitive closure is R ¼ R \ R2 \ R3 \ . . . \ Rk .

Definition 26: The weakest path from spatial unit xi to xj is defined by

ℒ  xi , x j ¼



ℒ xq1 ¼ xi , . . . , xqr ¼ x j Cðx1 , ..., x j Þ min

ð28Þ

where C(xi, xj) is the ordinary set of all paths from xi to xj defined by





L xq1 ¼ xi , . . . , xqr ¼ x j ¼ max μR xq1 , xq2 , . . . , μR xqr1 , xqr :

ð29Þ

Thus, the weakest path from xi to xj is the shortest of the longest links between xi and xj. Since





μR xi , x j ¼ L  xi , x j 8 xi , x j  X  X ð30Þ

1804

Y. Leung

the distance between spatial units xi and xj is the weakest path from xi to xj. Therefore, the dissimilarity of xi and xj can be interpreted as the weakest of the strongest dissimilarities of xi and xj in the path. Again, Ř can be decomposed into equivalence classes with respect to the degree of dissimilarity α  [0, 1]. Within each equivalence class, Řα , the dissimilarity between any two spatial units is no larger than α. Therefore, the α-level sets can be treated as α-regions within which any two elements are of α-distance from one another. For a specific value of α, a specific grouping pattern of spatial units is obtained. By considering all values of α in Ř, a hierarchical pattern is derived. Since antireflexivity in Eq. (24) and reflexivity in Eq. (16) are complementary, and min-max transitivity in Eq. (26) is the complement of max-min transitivity in Eq. (18), it is intuitive to expect that patterns based on the similitude relation and dissimilitude relation are identical. Based on different definitions of similarity/ distance, the decomposition of the corresponding transitive closures yields different results with disjoint or nondisjoint groups (Leung 1988). Besides the above fuzzy-relation approach, fuzzy clustering techniques (Bezdek et al. 1984) can also be employed to group spatial units into regions/regional types. These techniques give more statistical-like arguments to the grouping procedures.

3.5.2 Grouping for Regions via Fuzzy c-Means Clustering In conventional clustering, spatial units/data points are partitioned into distinct clusters so that each spatial unit/data point belongs to one and only one cluster. That is, the principle of mutual exclusiveness and exhaustiveness are enforced. There is no ambiguity about the belongingness to different clusters. However, as argued above, under fuzziness, a spatial unit may possess the characteristics of more than one cluster. Instead of an all-or-nothing belongingness, a spatial unit may be assigned a grade of membership in each cluster. In other words, units which are closer to a cluster center may have higher grade of membership than those which are on the periphery. Fuzzy c-means (FCM) clustering (Dunn 1973; Bezdek et al. 1984) is perhaps the most common technique employed to perform fuzzy clustering. The general idea is to partition n data points into c clusters under a specific criterion. Technically, let X ¼ {x1, . . . , xi, . . . , xn} be n data points. Let V ¼ {v1, . . . , vj, . . . , vc} be c clusters. To partition X into c clusters, the best criterion is to minimize the total sum of the squared distances between the data points and the cluster centers. Under fuzziness, we need to take into account the degree of belonging of a data point to a cluster. For data point xi, the weighted sum of squared distances to all cluster centers is xi ¼

Xc

xi  v j 2 w ij j¼1

ð31Þ

where vj is the cluster center and wij is the degree of data point xi belonging to vj Pn ðwik Þm xk v j ¼ Pk¼1 n m k¼1 ðwik Þ

ð32Þ

Fuzzy Modeling in Spatial Analysis

1805

with m denoting a parameter that can be used to change the relative grade of membership of a data point to the clusters. Taking all data points together, then the total sum of squared distances becomes 2 d ðX, V Þ ¼ Σni¼1 Σcj¼1 wm ij xi  v j :

ð33Þ

The FCM clustering algorithm is a means to minimize the above expression through an iterative method where

wij ¼

Σnk¼1



kxi  v j k kxi  vk k

!1 2

m1

:

ð34Þ

To sum up, the FCM algorithm can be summarized as follows: Step 1: Select the of clusters c. Set the initial grade of membership matrix n number o q ðqÞ W ¼ wij , q ¼ 0, 1, 2, . . . , is the iterative step. Step 2: Compute the cluster center at the q-th step Pn q m wik xk v j ¼ Pk¼1 n q m : k¼1 wik Step 3: Adjust W

(q)

via wij ¼

Σnk¼1



2 !1 jjxi v j jj m1 jjxi vk jj

.

Step 4: Stop, if kW (q + 1)  W (q)k  ϵ. Go to Step 2 otherwise. The fuzzy c-means algorithm has been applied to the grouping of spatial data in general and geographical information in particular; see for example the unmixing of coarse pixel signatures to identify land cover classes (Bastin 1997), grouping of attribute data derived from digital elevation model (De Bruin and Stein 1998), and the identification of hotspots for forest fire prevention (Di Martino and Sessa 2011). A general review of the fuzzy c-means development can be found in Miyamoto et al. (2008).

4

Spatial Optimization Problems

4.1

Optimization of a Precise Objective Function over a Fuzzy Constraint Set

A variety of spatial planning problems require the optimization of a precise objective function subject to a set of capacities and technological conditions which are in a state of uncertainty. Due to the imprecision of capacity available, it is more

1806

Y. Leung

appropriate to specify a tolerance interval with respect to the intended value. The corresponding constraint becomes fuzzy and lacks a clear-cut boundary. The implication then is that we are allowing for a certain extent of constraint violations. To guarantee a rational and logically consistent decision-making process under imprecision, it is important to establish a linear ordering on the set of alternatives X in terms of the extent to which a fuzzy constraint is satisfied. A natural way is to define the fuzzy constraint in X as a fuzzy set in X (Zimmermann 1976; Leung 1988). To obtain the solution of the optimization problem, it is sufficient to solve the following conventional parametric linear programming problem (see Verdegay 1982, 2015 and Chanas 1983): max f ðxÞ s:t: gi ðxÞ  bi þ dð1  αÞ i ¼ 1, . . . , m, α  0, x  0

ð35Þ

where d indicates the extent to which the deviation from the capacity is allowed. The problem in Eq. (35) is apparently a conventional parametric linear programming problem with parameter (1  α)  [0, 1], where α is the α-level sets of the fuzzy constraint set.

4.2

Optimization of a Fuzzy Objective Function over a Fuzzy Constraint Set

Assume that both the objective and constraints of a spatial planning problem are fuzzy and are defined as fuzzy sets on the set of alternatives in X. In formulating the decision space, objectives and constraints are structurally identical and can be treated symmetrically. According to Bellman and Zadeh (1970), the optimal alternative has to satisfy the fuzzy objective and fuzzy constraints simultaneously, the fuzzy decision space can then be derived as the intersection of the fuzzy objective and fuzzy constraints. Let (X, μ0 , μ1 , . . . , μm ) be the optimization system with μ0 defining the fuzzy objective and μi defining the fuzzy constraints i, i ¼ 1, . . . , m. The fuzzy decision space D is then defined by μD ðxÞ ¼ min½μ0 ðxÞ, μ1 ðxÞ, . . . , μm ðxÞ which is the overall satisfaction function of the fuzzy objective and constraints. The optimal solution is the alternative x such that μD ðxÞ ¼ sup min½μ0 ðxÞ, μ1 ðxÞ, . . . , μm ðxÞ: xX

The solution concept is based on the rationale that confluence of the fuzzy objective function and fuzzy constraints is a fuzzy set which defines a linear ordering on the set

Fuzzy Modeling in Spatial Analysis

1807

of alternatives in X. The best alternative should then be the one which is the highest in order, i.e. the one which gives the highest degree of overall satisfaction. As an example, to study general equilibrium in a market (Ponsard 1986, 1988), we need to consider all consumers and producers in the market. On the consumption side, consumers’ spatial partial equilibriums can be stated as

μDi xi ¼ sup μDi ðxi Þ ¼ sup μU i ðxi Þ xi  X i

xi  A i

(36)

where xi is the optimal consumption at equilibrium, Xi is the set of possible located consumptions of consumer i, Di ¼ Ui \ Ci is a fuzzy set in Xi defining the demand space of consumer i (Ui and Ci are  the fuzzy utility and budget  constraint of consumer i, respectively), and Ai ¼ xi jμC ðxi Þ  μUi ðxi Þ, xi  Xi . Considering all consumers, the P Pmset of total possible located consumption is X¼ m i¼1 Xi whose element x ¼ i¼1 xi is the total consumption in the economic space S. On the production side, producers’ spatial partial equilibriums can be stated as



μY j ðyj Þ ¼ sup μDj xj ¼ sup μPj xj , yj  Y j

yj  B j

(37)

where Y j is the set of possible productions of producer j, Dj ¼ Pj \ Ej is a fuzzy set in Y j defining the supply space of producer j (Pj and Ej are the fuzzy utility of profit and technological constraint of producer j, respectively), and B j ¼ fy j jμE ðy j Þ  μP ðy j Þ, y j  Y j g. Taking all producers into consideration, the set of total possible productions is n Y ¼ Σm i¼1 Y j whose element y ¼ Σj¼1 y j is the total production in S. Employing the median as an operator, the fuzzy total production Y in Y is defined by μY : Y ! ½0, 1 "

ð38Þ #



y ! μY ðyÞ ¼ Mdðyj Þ sup μDj xj : yj  Y j

(39)

When spatial decision making involves more than one objective, a fuzzy multiobjective optimization problem can be formulated. The basic idea of the fuzzy multiobjective linear programming approach is that individual objectives are to be maximized to their fullest extent within tolerable intervals of compromising. The purpose is to find a compromise solution with respect to the ideal solution.

Let (f1 , . . . , fh , . . . , fq) be the ideal solution: μ1 ðf1 Þ, . . . , μ2 ðfh Þ, . . . , μq fq ¼ (1, . . ., 1, . . ., 1). The ideal solution, generally infeasible, is the one which maximizes simultaneously all objective functions. Then the maximal distance between the ideal alternative and any alternative x is

1808

Y. Leung

( d1 ¼ max 1  h

PP k

k k j chj x j

dh

 fh

) ð40Þ

where, d1 is the maximal distance between the ideal alternative and any alternative x, ckhj is the unit return of producing activity j in region k under objective h, xkj is 

the production level of activity j in region k, f h ¼ min f h x1 , . . . ,

f h xh , . . . , f h ðxq Þ, and h ¼ 1, . . . , q, can be defined as the worst permissible value for objective h. dh ¼ ( f h  f h ) is the length of the tolerance interval. f h ¼

f h xh is the optimal value of objective h. Thus, a compromise solution can be obtained through the minimization of the maximal distance d1. That is, we need to solve

s:t:

min d1 PP k k f h  k j chj x j  d 1 , h ¼ 1, . . . , q dh

Σk Σ j akij xkj  bki , i ¼ 1, . . . , m; k ¼ 1, . . . , l; xkj  0, j ¼ 1, . . . , n; k ¼ 1, . . . , l: ð41Þ The physical meaning of the interregional equilibrium problem in Eq. (41) is that to reach a compromise we search for an efficient solution which minimizes the maximal distance between a spatial alternative and the ideal alternative (Zimmermann 1978; Leung 1988). Verdegay (2015) gives a review on the progress of fuzzy mathematical programming, particularly from a personal perspective. In addition to highlighting the fundamentals of fuzzy mathematical programming, he also discusses various applications in solving practical problems such as the transportation of solid wastes, shortest path, optimal flow, facility location problem, etc. He also suggests the hybridization of fuzzy mathematical programming with other soft computing methods to make spatial optimization more flexible and applicable.

5

Conclusion

Based on the fuzzy set approach to spatial analysis and planning under vagueness of our linguistic and spatial systems (as systematically examined in Leung 1988) in general and human economic values and behaviors (as systematically summarized in Ponsard 1988) in particular, fuzzy set and fuzzy logic methods have been extensively applied to geographical studies and spatial economic analyses. Throughout this process, other paradigms such as neural networks, genetic algorithms and the most recent burgeoning growth of deep learning (Goodfellow et al. 2016) have rendered us a variety of methodologies to study geographical structures and processes as well

Fuzzy Modeling in Spatial Analysis

1809

as regional economic problems. In addition to the further deepening in theory and more extensive applications in practice of the fuzzy modeling of human spatial behaviors, successful integration of fuzzy set theory with other emerging soft computing paradigms will be of great value for the study of complex spatial systems, particularly in the big data era.

6

Cross-References

▶ Exploratory Spatial Data Analysis ▶ Geospatial Analysis and Geocomputation: Concepts and Modeling Tools ▶ Methods of Environmental Valuation ▶ Spatial Equilibrium in Labor Markets

References Altman D (1994) Fuzzy set theoretic approaches for handling imprecision in spatial analysis. Int J Geogr Inf Syst 8(3):271–289 Bastin L (1997) Comparison of fuzzy c-means classification, linear mixture modelling and MLC probabilities as tools for unmixing coarse pixels. Int J Remote Sens 18(17):3629–3648 Bellman RE, Zadeh LA (1970) Decision-making in a fuzzy environment. Manag Sci 17(4):B141– B164 Berry BJL (1967) Grouping and regionalization, an approach to the problem using multivariate analysis. In: Garrison WL, Marble DF (eds) Quantitative geography, part I: economic and cultural topics. Northwestern University Press, Evanston, pp 219–251 Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203 Burrough PA, Frank A (1996) Geographic objects with indeterminate boundaries. Taylor and Francis, London Chanas S (1983) The use of parametric programming in fuzzy linear programming. Fuzzy Sets Syst 11(1–3):243–251 De Bruin S, Stein A (1998) Soil-landscape modelling using fuzzy c-means clustering of attribute data derived from a digital elevation model (DEM). Geoderma 83(1–2):17–33 Di Martino F, Sessa S (2011) The extended fuzzy C-means algorithm for hotspots in spatiotemporal GIS. Expert Syst Appl 38(9):11829–11836 Dubois D, Prade H (2002) Possibility theory, probability theory and multiple-valued logics: a clarification. Ann Math Artif Intell 32(1–4):35–66 Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact wellseparated clusters. J Cybern 3(3):32–57 Fischer MM (1984) Tassonomia regionale: alcuni riflessioni sullo dell’arte. In: Clemente F (ed) Pianificazione del Territorio e Sistema Informativo. Franco Angeli Libri, Milan, pp 365– 400 Gale S (1972) Inexactness, fuzzy sets and the foundation of behavioral geography. Geogr Anal 4(4):337–349 Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge Leung Y (1982) Approximate characterization of some fundamental concepts of spatial analysis. Geogr Anal 14(1):29–40 Leung Y (1988) Spatial analysis and planning under imprecision. Elsevier, Amsterdam

1810

Y. Leung

Miyamoto S, Ichihashi H, Honda K (2008) Algorithms for fuzzy clustering methods in c-means clustering with applications. Springer, Berlin Ponsard C (1986) A theory of spatial general equilibrium in a fuzzy economy. In: Ponsard C, Fustier B (eds) Fuzzy economics and spatial analysis. Librairie de l’ Universite, Dijon, pp 1–27 Ponsard C (1988) Fuzzy mathematical models in economics. Fuzzy Sets Syst 28(3):273–283 Robinson VB (2003) A perspective on the fundamentals of fuzzy sets and their use in geographic information systems. Trans GIS 7(1):3–30 Verdegay JL (1982) Fuzzy mathematical programming. In: Gupta MM, Sanchez E (eds) Fuzzy information and decision processes. North-Holland, Amsterdam, pp 231–237 Verdegay JL (2015) Progress on fuzzy mathematical programming: a personal perspective. Fuzzy Sets Syst 281(15):219–226 Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353 Zadeh LA (1975a) The concept of a linguistic variable and its application to approximate reasoning, I. Inf Sci 8(3):199–249 Zadeh LA (1975b) The concept of a linguistic variable and its application to approximate reasoning, II. Inf Sci 8(4):301–357 Zadeh LA (1975c) The concept of a linguistic variable and its application to approximate reasoning, III. Inf Sci 9(1):43–80 Zadeh LA (1978) Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst 1(1):3–28 Zimmermann HJ (1976) Description and optimization of fuzzy systems. Int J Gen Syst 2(1):209– 215 Zimmermann HJ (1978) Fuzzy programming and linear programming with several objective functions. Fuzzy Sets Syst 1(1):45–55 Zimmermann HJ (2010) Fuzzy set theory. Wiley Interdiscip Rev Comput Stat 2(3):317–332

Section X Spatial Statistics

Geostatistical Models and Spatial Interpolation Peter M. Atkinson and Christopher D. Lloyd

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Characterizing Spatial Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 The Experimental Variogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Variogram Model Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Spatial Interpolation with Kriging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Simple and Ordinary Kriging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 The Change of Support Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Variogram Deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Area-to-Point Kriging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Geostatistics and Regional Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1814 1815 1815 1816 1818 1820 1820 1822 1822 1822 1823 1823 1825 1826 1826

Abstract

Characterizing the spatial structure of variables in the regional sciences is important for several reasons. Firstly, the spatial structure may itself be of interest. The structure of a population variable tells us something about how the population is configured spatially. For example, is the population clustered by some properties, but not others? Secondly, mapping variables from sparse sample observations or transferring values between areal units requires knowledge of how the property of P. M. Atkinson (*) Lancaster Environment Centre, University of Lancaster, Lancaster, UK e-mail: [email protected] C. D. Lloyd School of Natural and Built Environment, Queen’s University Belfast, Belfast, UK e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_75

1813

1814

P. M. Atkinson and C. D. Lloyd

interest varies spatially. Thirdly, we require knowledge of spatial variation in order to design sampling strategies which make the most of the effort, time, and money expended in sampling. Geostatistics comprises a set of principles and tools which can be applied to characterize or model spatial variation and use that model to optimize the mapping, simulation, and sampling of spatial properties. This chapter provides an introduction to some key ideas in geostatistics, with a particular focus on the kinds of applications which may be of interest for regional scientists. Keywords

Ordinary kriging · Variogram model · Nugget effect · Conditional simulation · Experimental variogram

1

Introduction

Geostatistics provides a body of techniques which can be used to characterize spatial variation, interpolate and simulate spatial variables, and design optimal spatial sampling strategies (Journel and Huijbregts 1978; Goovaerts 1997; Haining et al. 2010). To date, most applications of geostatistics have been in the physical sciences. However, there is an increasing diversity of research in the social sciences which makes use of geostatistical methods. This chapter reviews some of the key principles underlying geostatistics and considers the contributions these approaches may make in a regional science context. Underlying classical geostatistics is the random function (RF) model (Journel and Huijbregts 1978; Chilès and Delfiner 1999). A RF is a set of random variables (RVs) which vary as a function of location x. An RV is a stochastic process, a simple discrete example of which is the rolling of a die with the outcome being a value between one and six. Each outcome of this process is termed a realization. In most practical applications of geostatistics, variables are continuous. As an example, in a geostatistical framework, population density z at location x is considered a RV, and the set of RVs at locations z(xi) i = 1,. . ., n is a RF, or more precisely, since n is finite, a random vector. Observations of population densities are termed regionalized variables (ReVs), and they are treated as stochastic realizations of an underlying RF. In geostatistics, it is common to fit a stationary RF model (Journel and Huijbregts 1978; Chilès and Delfiner 1999). First, a spatially stationary (i.e., constant) mean parameter is usually defined. However, various alternatives are common in which the mean is allowed to vary across space (Goovaerts 1997) including those RF models that include a spatially varying “trend” (e.g., a two-dimensional polynomial or some spatially varying function of covariates; Hudson and Wackernagel 1994). A second parameter (more strictly a model comprising several parameters) that is usually defined is the stationary spatial covariance function (which implies second-order stationarity; see Journel and Huijbregts 1978) or variogram (defined further below, which implies intrinsic stationarity), either of which can be used to

Geostatistical Models and Spatial Interpolation

1815

represent the “character” of spatial variation. Since the variogram function is more widely applicable than the spatial covariance function, we will refer to it here and in the remainder of the text. The mean and variogram are, thus, the parameters that define the RF, and, thus, it is these parameters that need to be estimated to provide a useful model for geostatistical inference (e.g., prediction of the value of the property of interest at some unobserved location). One might be wondering what utility the variogram brings and why stationarity is required as a modeling decision. We offer the following lay person explanation. In geostatistics, the central modeling decision is that the character of spatial variation can be treated as “stationary” (i.e., independent of location). This decision to fit a RF model that is parameterized by a stationary function representing the character of spatial variation is crucial to geostatistical inference. It means that it is possible to (i) lump together the observed spatial variation from different pairs of locations separated by specific distances, and possibly directions (this is essentially what the variogram does); (ii) use that information to infer the character of spatial variation at new pairs of locations; and (iii) crucially, given an observation representing one of a new pair of locations, predict at the other unobserved location based on the imputed character of spatial variation and, thus, how related we expect the two locations to be. We now show (i) how the RF model can be estimated through calculation of the empirical or experimental variogram and the fitting of mathematical models with parameters and (ii) how the parameterized RF model can be used for spatial prediction through a technique known as Kriging and in other geostatistical operations.

2

Characterizing Spatial Variation

Most geostatistical analyses proceed with a summary of the properties of the data, and this is generally regarded as good practice. As in any statistical analysis, summary statistics and the histogram may suggest that the data should be transformed in some way (e.g., taking a logarithm so that the transformed distribution is approximately normal). The spatial structure of a variable can be characterized with several structure functions, most commonly the variogram (outlined below).

2.1

The Experimental Variogram

The variogram cloud relates (half the) the squared differences (i.e., the semivariances) between paired data values to the distance (and possibly direction) by which they are separated, and it, thus, provides a summary of how different observations are as a function of distance. The variogram (or, more properly, semivariogram) collapses the information contained in the variogram cloud; the semivariances within distance bands are

1816

P. M. Atkinson and C. D. Lloyd

averaged, and so, for example, there is an average semivariance for data pairs separated by 0–10 m, 10–20 m, and so on. The plot of average semivariance against distance band indicates how spatially dependent the variable is. In practice, many properties tend to become more dissimilar as the separation distance increases. The experimental variogram, γ^ðhÞ, is estimated from p(h) paired observations, z (xα), z(xα + h), α = 1, 2, . . . , p(h) with: γ^ðhÞ ¼

pðhÞ 1 X fzðxα Þ  zðxα þ hÞg2 2pðhÞ α¼1

(1)

Semivariances can be computed within particular directional tolerances. For example, only data pairs which are aligned approximately north–south could be included in Eq. (1). In this way, it is possible to assess how spatial variation differs as a function of direction. Where the variable of interest is a rate calculated from a numerator and a population-based denominator (e.g., as for disease rate), the standard experimental variogram can be modified easily to account for spatially varying underlying populations. That is, greater weight can be given to zones with large populations, and thus, it is possible to account for spatially varying uncertainties in rates (see Goovaerts 2005; Goovaerts et al. 2005).

2.2

Variogram Model Fitting

The variogram may be used as a means in itself to analyze the spatial structure of a variable (e.g., Berberoglu et al. 2000; Lloyd et al. 2004). More commonly, it is the first stage in Kriging-based spatial interpolation, as described below. A model can be fitted to the variogram, for example, by weighted least squares (Cressie 1985; McBratney and Webster 1986). Then the coefficients of the fitted model can be used as an input to the Kriging process. In principle, the variogram model provides information on how much weight is given to observations as a function of their distance from prediction locations. This use of information on spatial structure makes Kriging distinct from other commonly applied methods of interpolation such as inverse distance weighting, whereby weights are a simple function of distance and no account is taken of the specific spatial structure of the data being analyzed. The most commonly used models are taken from a set of “permissible” or “authorized” models. Some of the most commonly used authorized models are detailed below. The nugget effect model, which indicates measurement error and microscale variation (variation at a distance smaller than the sample spacing), is given by:  γ ð hÞ ¼

0 for h ¼ 0 c0 for jhj > 0

(2)

Geostatistical Models and Spatial Interpolation

1817

Three of the most frequently used bounded models are the spherical model, the exponential model, and the Gaussian model, and these are each defined below. Each of these models is “bounded” in that each has a maximum value of semivariance defined by a “sill” parameter. The exponential model is given by:    h γ ðhÞ ¼ c 1  exp  d

(3)

where c is the sill of the exponential model and d is the non-linear distance parameter. The exponential model reaches the sill asymptotically, and the practical range is 3d (i.e., the separation at which approximately 95% of the sill is reached). The spherical model is a very widely used variogram model, and its form corresponds well with what is often observed in real-world studies, with an almost linear growth in semivariance to a particular separation and then stabilization (Armstrong 1998). It is given by: 8 "  3 # > h > < c 1:5  0:5 h if h  a a a γ ð hÞ ¼ > > : c if h > a where a is the non-linear parameter, known as the range. The Gaussian model is given by:   2  h γ ðhÞ ¼ c 1  exp  2 d

(4)

(5)

As for the exponential model, the Gaussian model does not reach a sill at a finite distance, but rather approaches a defined sill asymptotically. The practical range is pffiffiffi a 3 (Journel and Huijbregts 1978). Variograms with parabolic behavior at the origin, as represented by the Gaussian model here, are indicative of very regular spatial variation (Journel and Huijbregts 1978). Authorized models can be used in positive linear combination where a single model is insufficient to represent well the form of the variogram (Webster and McBratney 1989). Figure 1 depicts a variogram model comprising the combination of a nugget effect and a spherical component. In practice, the combination of the nugget effect and one or more permissible models is common. Where the experimental variogram does not reach a maximum, it may be desirable to fit a trend model to the data and model the residuals with a bounded model. In some circumstances, it may be desirable to select an unbounded variogram model. The most widely used unbounded model is the power model: γ ðhÞ ¼ mhω

(6)

where ω is a power 0 < ω < 2 with a positive slope, m (Deutsch and Journel 1998); the linear model is a special case of the power model.

1818

P. M. Atkinson and C. D. Lloyd

Fig. 1 Bounded variogram model: nugget effect and spherical model

Total sill (c0+c)

Range (a)

Sill (c)

Nugget (c0)

Lag (h)

2.3

Example

The variogram characterizes the spatial structure in a variable. Where the variable represents some characteristic of a population, the variogram captures information on the dominant spatial scales of variation in that population property. Figure 3 shows variograms estimated from log-ratio transformed data on the percentage of Catholics in Northern Ireland in 1971, 1991, and 2001 for 1-km-square cells.pThe ffiffiffi data are from the Census of Population, and the log ratios are computed with 1= 2 ln(Catholics by religion (%)/non-Catholics by religion (%)). The percentages were computed with n1 + 1 and n2 + 1, where n1 is the number of Catholics and n2 is the number of non-Catholics for each 1-km cell; this avoids logging zeros and reflects uncertainties in small counts. The data and transformation rationale are described in Lloyd (2010). Briefly, where the variables used sum to a constant (e.g., percentages sum to 100), use of raw values may be problematic, and Aitchison (1986) suggests that use of log ratios is a suitable approach; most applications of such approaches are with respect to analysis of compositions with multiple parts. The present analysis is based on only one variable (religion or community background) with only two groups (Catholic or non-Catholic), which can, thus, be expressed as a single ratio. However, Filzmoser et al. (2009) show that univariate statistical methods should not be applied directly to (raw) compositional data. Furthermore, Pawlowsky and Burger (1992) argue that structural analysis should not be conducted using raw proportions or percentages as the constant-sum constraint may lead to a distorted picture of the spatial covariance structure and possibly erroneous interpretations. The variograms in Fig. 2 were fitted with a nugget effect and two spherical components. This combination of models reflects nested spatial structures. That is, the variogram models exhibit two breaks corresponding to distances of approximately 5 and 30 km, and these spatial scales are represented by the two model ranges. The ranges and the nugget effects are similar for each of the 3 census years, while the total sill, which represents the magnitude of variation, for 1971 is much smaller than the sills for 1991 and 2001. These results suggest that the major scales of variation did not change between 1971 and 2001 – the areas which were dominated by Catholics or Protestants were broadly similar. What did change was

Geostatistical Models and Spatial Interpolation

1819

2.5

Semivariance

2

1.5

1 1971 1991 2001

0.5

0 0

5000

10000

15000

20000

25000

30000

35000

Lag (m)

Fig. 2 Variogram of religion log ratios for 1971, 1991, and 2001 in Northern Ireland

Semivariance ((Deaths/1000 persons)2)

35

30

25

20

15

10 Experimental semivariances Data model Deconvolved model

5

0 0

2000

4000

6000

8000

10000

Lag (m)

Fig. 3 Variogram of deaths/1,000 persons for SOAs with fitted model and deconvolved model

1820

P. M. Atkinson and C. D. Lloyd

the magnitude of differences between places. In support of previous studies of residential segregation in Northern Ireland, the variograms suggest that the concentrations of Catholics and Protestants increased between 1971 and 1991, but there was little change (at the Northern Ireland (NI) scale at least) between 1991 and 2001. In other words, in 1991 Catholic areas were more Catholic than they had been in 1971, while Protestant areas had also become more Protestant. Lloyd (2012) discussed these results in more depth and also estimates variograms locally to enable assessment of how population spatial structures vary across NI. The derivation of local variograms is discussed by Lloyd (2011).

3

Spatial Interpolation with Kriging

3.1

Simple and Ordinary Kriging

Ordinary Kriging (OK) predictions are weighted averages of the n locally available data. The OK weights define the best linear unbiased predictor (BLUP). The OK prediction, ^ z OK ðx0 Þ, is defined as: ^z OK ðx0 Þ ¼

n X α¼1

λOK α z ð xα Þ

(7)

with the constraint that the weights, λOK α , sum to one to ensure an unbiased prediction: n X α¼1

λOK α ¼1

(8)

Using the Kriging system, appropriate weights can be determined, and these are multiplied by the available observations before summing the products to obtain the predicted value. These weights are derived given the coefficients of a model fitted to the variogram (or another function such as the covariance function) (Oliver 2010). The Kriging prediction error must have an expected value of zero:  E Z^ OK ðx0 Þ  Z ðx0 Þ ¼ 0

(9)

The Kriging (or prediction) variance, σ 2OK , is expressed as: σ^2OK ðx0 Þ ¼ E

h

n n X n X X

2 i OK Z^ OK ðx0 Þ  Z ðx0 Þ γ ð x  x Þ  λOK ¼ 2 λOK α 0 α α λβ γ xα  xβ α¼1

α¼1 β¼1

(10) That is, we seek the values of λ1,. . ., λn (the weights) that minimize this expression with the constraint that the weights sum to one (Eq. 8). This minimization is

Geostatistical Models and Spatial Interpolation

1821

achieved through Lagrange multipliers. The conditions for the minimization are given by the OK system comprising n + 1 equations and n + 1 unknowns: 8 n X

> > > λOK > β γ xα  xβ þ ψ OK ¼ γ ðxα  x0 Þ < > > > > :

α ¼ 1, . . . , n

β¼1 n X β¼1

(11)

λOK β ¼ 1

where ψ OK is a Lagrange multiplier. Given ψ OK, the prediction variance of OK can be derived with: σ^2OK ¼

n X α¼1

λOK α γ ðxα  x0 Þ þ ψ OK

(12)

The Kriging variance estimates the prediction variance (i.e., the square of the prediction error) based on the linear algebra given above. It is, thus, a measure of confidence in predictions and is a function of the form of the variogram, the sample configuration, and the sample support (Journel and Huijbregts 1978). The Kriging variance is, however, not conditional on the data values locally, and this has led some researchers to use alternative approaches such as conditional simulation (discussed in the next section) to build models of “spatial” uncertainty (Goovaerts 1997). There are two standard varieties of OK: punctual OK and block OK. With punctual OK the predictions cover the same area or volume (the support, v; see Atkinson and Tate 2000) as the observations. For example, if observations of some property of a population are made through a grid-based census with a grid cell size of 1 km, as in NI, then predictions made from those data will also have a support of (i.e., represent) 1-km cells. In block OK, the predictions are made to a larger support than the observations. With punctual OK the original observation data are honored. That is, they are retained in the output map. Block OK predictions are averages over areas (i.e., the support has increased). Thus, at x0 the prediction is not the same as an observation and does not need to honor it. In regional science contexts, data are rarely available on point supports, and often a concern may be how to predict from areas to points, a topic discussed below. The choice of variogram model affects the Kriging weights and, therefore, the predictions. However, if the form of two models is similar at the origin of the variogram, then the two sets of results may be similar (Armstrong 1998). The choice of nugget effect may have marked implications for both the predictions and the Kriging variance. As the nugget effect is increased, the predictions become closer to the global average (Isaaks and Srivastava 1989). Poisson Kriging provides an appropriate means to interpolate rare events such as mortality. The Poisson Kriging system incorporates the population-weighted mean of the rates (Goovaerts 2005) which accounts for the variability due to population size, with larger weights where the population size is larger and, therefore, where the data may be considered more reliable.

1822

P. M. Atkinson and C. D. Lloyd

Other forms of Kriging include Kriging with a trend model (where large-scale trends in the data are explicitly taken into account), cokriging (where information on secondary variables is used in the prediction process) (Wackernagel 2003), and indicator Kriging (where the data are transformed into a set of thresholds and the probability of exceeding thresholds is estimated) (Goovaerts 1997).

3.2

Simulation

Kriging predictions are weighted moving averages, and thus, Kriging is a smoothing interpolator. In block Kriging, some amount of smoothing happens naturally through averaging over the larger support, but some occur due to the interpolation process itself, which effectively extends the support out to include the observations. The latter smoothing is an unwanted artifact of the interpolation process. A means of overcoming this unwanted smoothing is conditional simulation (Journel 1996; Goovaerts 1997; Dungan 1999). With conditional simulation, predictions are drawn from equally probable joint realizations of the RVs which comprise a RF model (Deutsch and Journel 1998). The simulated values are not the expected values (as in Kriging), but rather they are drawn from the conditional cumulative distribution function (ccdf), as a function of the available data, including both the observations and previously simulated data, and the (modeled) spatial variation. Since the approach is stochastic, multiple simulated realizations can be obtained. The range of values obtained represents the “spatial” uncertainty.

4

The Change of Support Problem

In regional science contexts, data are often available for zones rather than points. While individual or household level data are available in some contexts, these are usually provided without detailed spatial information, and spatially aggregated data usually offer the only means of exploring detailed spatial patterns. The data support, v, is defined as the geometrical size, shape, and orientation of the units associated with the measurements (Atkinson and Tate 2000). Thus, making predictions from areas to points corresponds to a change of support. Geostatistics offers the means to (i) explore how the spatial structure of a variable changes with change of support and (ii) change the support by interpolation to an alternative zonal system or to a quasipoint support (Schabenberger and Gotway 2005).

4.1

Regularization

For many applications, the variogram defined on a point support is not available. Indeed, it is physically impossible to measure on a point support given that all measurements are integrals over a positive finite support. Thus, only values over a

Geostatistical Models and Spatial Interpolation

1823

positive support (area) may be available. The variogram of such aggregated data is termed the regularized or areal variogram (Goovaerts 2008).

4.2

Variogram Deconvolution

Theoretically, given the point support variogram, it is possible to estimate the variogram for any support. Through variogram deconvolution, the point support variogram can be estimated from the variogram estimated from areal data. Atkinson and Curran (1995) derived the point support variogram from the areal support variogram, with the areas defined as regular grids, following an iterative procedure. Of greater relevance for many regional science applications is variogram deconvolution for irregular supports, that is, those cases where the data are available on irregular zones (such as census or administrative zones) rather than regular cells. Goovaerts (2008) presents a procedure for variogram deconvolution given irregular zones, and this method is implemented in the STIS software (see http://www. biomedware.com/). The above deconvolution method is illustrated here through an example. The case study makes use of data on deaths per 1,000 of the population in Northern Ireland (NI). The death data are counts for 2008, and the denominator is given as the midyear estimates of total population in 2008 for super output areas (SOAs). Figure 3 gives the variogram estimated from deaths/1,000 persons over SOAs. The deconvolved model, derived using the method of Goovaerts (2008), is also given. The application of the deconvolved model for Kriging is illustrated below.

4.3

Area-to-Point Kriging

Given the deconvolution procedures outlined above, it is possible to make predictions at point locations given data defined on areal supports. Kyriakidis (2004) and Goovaerts (2008) show how the Kriging system is adapted in the case of areal data supports and point prediction locations. Area-to-point Kriging is illustrated given the death rate data and the deconvolved variogram detailed in the previous section. Ordinary Kriging with Poisson population (2008 midyear estimates for each SOA) adjustment was applied with a population denominator of 1,000 (i.e., the deaths are rates per 1,000 of the population). The discretization geography was 1-km cells populated in 2001 (from 2001 Census of Population), with numbers of persons per cell (2001 Census counts) as weights. The destination geography (locations where estimates are required) was 1-km cells (as for the discretization geography). Figure 4 shows deaths/1,000 persons (a) for SOAs and (b) derived using area-to-point Kriging at 1-km cells. In areas covered by large SOAs (with lower density populations), the increased detail is particularly apparent.

1824

P. M. Atkinson and C. D. Lloyd

Fig. 4 Deaths/1,000 persons (a) for SOAs and (b) derived using area-to-point Kriging at 1-km cells (only populated cells shown). (Source: 2001 Census, Output Area Boundaries. Crown copyright 2003)

Geostatistical Models and Spatial Interpolation

5

1825

Geostatistics and Regional Science

Geostatistics originated in the mining industry through the work of Danie Krige in the 1950s and particularly through Georges Matheron who developed the Theory of Regionalized Variables in the 1960s and 1970s. From the original mining geology application, domain geostatistics found further application in the dominant field of petroleum geology, particularly through the Stanford Center for Reservoir Forecasting (SCRF) led by Andre Journel, but also in hydrogeology led by researchers such as Jaime GómezHernández, in soil science led by the work of Richard Webster (see Webster and Oliver 2000) and Alex McBratney and subsequently Pierre Goovaerts, and in remote sensing and environmental science more generally. Thus, most applications of geostatistics can be divided into mining, petroleum, and “environmental”, and indeed, many conferences such as GeoENV organized themselves in this way for many years. However, the potential of geostatistics in the social sciences has been highlighted in recent research. Much interest has been generated in the use of geostatistics to study a range of population-based properties, including rates based on census attributes as exemplified in this chapter and disease outcomes. This has been enabled by two key developments: (i) the pioneering work of Pascal Monestiez and Pierre Goovaerts on Poisson Kriging (Goovaerts 2005; Monestiez et al. 2006), which enables the prediction of counts (e.g., mortality or morbidity) given an underlying population (e.g., at-risk population), and (ii) the parallel development in statistics of what has become known as model-based geostatistics. For example, in relation to the former, population-weighted variograms enable robust estimation in cases where the population underlying the observations is variable in magnitude. Moreover, Poisson Kriging allows the spatial interpolation of variables representing “rare” events, such as mortality. Model-based geostatistics deserves special mention here as it represents an alternative Bayesian approach to geostatistics that has been made popular by statisticians such as Peter Diggle and is favored by the epidemiological community. Model-based geostatistics differs from the classical approach presented in this chapter in that the parameters of the RF model are regarded as uncertain, whereas in the classical approach these parameters are estimated through variogram model fitting and then regarded as “fixed” for the purposes of Kriging. As a result, model-based geostatistics conveys naturally the uncertainty in parameter estimation through to the prediction, resulting in a more complete description of prediction uncertainty. The general approach advocated by Diggle and others is two staged: (i) regression is used to predict the variable of interest from covariates, and (ii) geostatistics is used to predict the residuals at unobserved locations based on spatial dependence in those residuals. Since model-based geostatistics naturally allows for the adoption of a range of regression link functions (including the Poisson and logistic functions), the framework is general and can be applied readily to the population-based prediction problems described above. Thus, the development of new approaches, or adaptations of existing methods, has made the potential range of applications of geostatistical methods in regional science much wider. The utility of area-to-point Kriging is much greater than may at first seem the case. Censuses are the primary means of enumerating the populations of the world’s

1826

P. M. Atkinson and C. D. Lloyd

nations. However, the majority of them use arbitrary areal units to convey the original data, which were measured at the individual level, to the public (e.g., for reasons of preventing disclosure of personal information). Consequently, key variables relating to the world’s population are obscured by (i) the aggregation effect which is effectively the convolution discussed above and (ii) the zonation effect which means that alternative realizations of the set of arbitrary boundaries may lead to very different realizations. This is the well-known modifiable areal unit problem (MAUP) which has dogged census analysis for decades. Area-to-point Kriging tackles the MAUP head on allowing researchers to bypass the sampling frame imposed by a particular set of areal units and re-present the census data on any support. While there are obvious limits to what can be achieved (it is not possible to generate more information than one started with), the results are visually stunning and do represent an optimal linear solution to the problem (Kyriakidis 2004; Goovaerts 2008). Additionally, applications that require common supports for multiple time periods (e.g., for which zonal systems may differ) may also benefit from recent developments in variogram deconvolution and area-to-point Kriging.

6

Conclusions

This chapter has introduced classical geostatistics and emphasized the advances that have been made in recent years that have opened up the possibility of applying geostatistics in social science and regional science. The ease of access to software which implements modern geostatistical methods means that researchers are increasingly likely to be held back only by their knowledge, rather than appropriate tools. It is hoped that this chapter will provide a useful starting point for those who wish to explore geostatistical methods in the context of regional science. Acknowledgments The authors thank the anonymous referees for their comments and thank the editors for their patience while this chapter was finalized. The Northern Ireland Statistics and Research Agency are thanked for access to data. Census output is Crown copyright and is reproduced with the permission of the Controller of HMSO and the Queen’s Printer for Scotland. Northern Ireland Statistics and Research Agency, 2001 Census: Standard Area Statistics (Northern Ireland) [computer file]. ESRC/JISC Census Programme, Census Dissemination Unit, Mimas (University of Manchester).

References Aitchison J (1986) The statistical analysis of compositional data. Chapman and Hall, London Armstrong M (1998) Basic linear geostatistics. Springer, Berlin Atkinson PM, Curran PJ (1995) Defining an optimal size of support for remote sensing investigations. IEEE Trans Geosci Remote Sens 33(3):768–776 Atkinson PM, Tate NJ (2000) Spatial scale problems and geostatistical solutions: a review. Prof Geogr 52(4):607–623 Berberoglu S, Lloyd CD, Atkinson PM, Curran PJ (2000) The integration of spectral and textural information using neural networks for land cover mapping in the Mediterranean. Comput Geosci 26(4):385–396

Geostatistical Models and Spatial Interpolation

1827

Chilès J-P, Delfiner P (1999) Geostatistics: modeling uncertainty. Wiley, New York Cressie NAC (1985) Fitting variogram models by weighted least squares. Math Geol 17(5):563–586 Deutsch CV, Journel AG (1998) GSLIB: geostatistical software and user’s guide, 2nd edn. Oxford University Press, New York Dungan JL (1999) Conditional simulation. In: Stein A, van der Meer F, Gorte B (eds) Spatial statistics for remote sensing. Kluwer, Dordrecht, pp 135–152 Filzmoser P, Hron K, Reimann C (2009) Univariate statistical analysis of environmental (compositional) data: problems and possibilities. Sci Total Environ 407(23):6100–6108 Goovaerts P (1997) Geostatistics for natural resources evaluation. Oxford University Press, New York Goovaerts P (2005) Geostatistical analysis of disease data: estimation of cancer mortality risk from empirical frequencies using Poisson kriging. Int J Health Geogr 4:31pp Goovaerts P (2008) Kriging and semivariogram deconvolution in the presence of irregular geographical units. Math Geosci 40(1):101–128 Goovaerts P, Jacquez GM, Greiling D (2005) Exploring scale-dependent correlations between cancer mortality rates using factorial kriging and population-weighted semivariograms. Geogr Anal 37(2):152–182 Haining RP, Kerry R, Oliver MA (2010) Geography, spatial data analysis, and geostatistics: an overview. Geogr Anal 42(1):7–31 Hudson G, Wackernagel H (1994) Mapping temperature using kriging with external drift: theory and an example from Scotland. Int J Climatol 14(1):77–91 Isaaks EH, Srivastava RM (1989) An introduction to applied geostatistics. Oxford University Press, New York Journel AG (1996) Modelling uncertainty and spatial dependence: stochastic imaging. Int J Geogr Inf Syst 10(5):517–522 Journel AG, Huijbregts CJ (1978) Mining geostatistics. Academic, London Kyriakidis PC (2004) A geostatistical framework for area-to-point spatial interpolation. Geogr Anal 36(3):259–289 Lloyd CD (2010) Exploring population spatial concentrations in Northern Ireland by community background and other characteristics: an application of geographically weighted spatial statistics. Int J Geogr Inf Sci 24(8):1193–1221 Lloyd CD (2011) Local models for spatial analysis, 2nd edn. CRC Press, Boca Raton Lloyd CD (2012) Analysing the spatial scale of population concentrations by religion in Northern Ireland using global and local variograms. Int J Geogr Inf Sci 26(1):57–73 Lloyd CD, Berberoglu S, Curran PJ, Atkinson PM (2004) A comparison of texture measures for the per-field classification of Mediterranean land cover. Int J Remote Sens 25(19):3943–3965 McBratney AB, Webster R (1986) Choosing functions for semi-variograms of soil properties and fitting them to sampling estimates. J Soil Sci 37(4):617–639 Monestiez P, Dubroca L, Bonnin E, Durbec J-P, Guinet C (2006) Geostatistical modelling of spatial distribution of Balaenoptera physalus in the Northwestern Mediterranean Sea from sparse count data and heterogeneous observation efforts. Ecol Model 193(3–4):615–628 Oliver MA (2010) The variogram and kriging. In: Fischer MM, Getis A (eds) Handbook of applied spatial analysis. Software tools, methods and applications. Springer, Berlin/Heidelberg/New York, pp 319–352 Pawlowsky V, Burger H (1992) Spatial structure analysis of regionalized compositions. Math Geol 24(6):675–691 Schabenberger O, Gotway CA (2005) Statistical methods for spatial data analysis. Chapman and Hall/CRC, Boca Raton Wackernagel H (2003) Multivariate geostatistics. An introduction with applications, 3rd edn. Springer, Berlin Webster R, McBratney AB (1989) On the Akaike information criterion for choosing models for variograms of soil properties. J Soil Sci 40(3):493–496 Webster R, Oliver MA (2000) Geostatistics for environmental scientists. Wiley, Chichester

Spatial Sampling Eric M. Delmelle

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 One-Dimensional Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Two-Dimensional Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Geostatistical Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Designs for Variogram Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Optimal Designs to Minimize the Kriging Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Sampling in a Multivariate Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Second-Phase Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Search Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1830 1830 1830 1831 1833 1835 1836 1837 1839 1840 1842 1843 1843 1843

Abstract

In spatial sampling, we collect observations in a two-dimensional framework. Careful attention is paid to the quantity of the samples, dictated by the budget at hand, and the location of the samples. A sampling scheme is generally designed to maximize the probability of capturing the spatial variation of the variable under study. Once initial samples of the primary variable have been collected and its variation documented, additional measurements can be taken at other locations. This approach is known as second-phase sampling, and various optimization criteria have recently been proposed to determine the optimal location of these new observations. In this chapter, we review fundamentals of spatial sampling E. M. Delmelle (*) Department of Geography and Earth Sciences, University of North Carolina at Charlotte, Charlotte, NC, USA e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_73

1829

1830

E. M. Delmelle

and second-phase designs. Their characteristics and merits under different situations are discussed, while a numerical example illustrates a modeling strategy to use covariate information in guiding the location of new samples. The chapter ends with a discussion on heuristic methods to accelerate the search procedure. Keywords

Sampling designs · Geostatistics · Covariate information · Sampling objectives

1

Introduction

1.1

Context

Spatial or two-dimensional sampling has been applied to many disciplines such as mining, soil studies, telecommunications, ecology, geology and geography, to cite a few (see Haining 1990 for a summary). When collecting spatial data, scientists are usually constrained by available budget and time to acquire a certain number of samples instead of trying to obtain information everywhere (see, e.g.; Müller 1998; Thompson 2002; Delmelle 2009 for various summaries). It is generally desirable to find so called optimal samples that are as representative as possible from the real data. Not only the cost of a complete census is prohibitive, it is time-consuming (Haining 1990) and it may result in redundant data when data is spatially autocorrelated (Griffith 2005). The autocorrelation function is defined as the similarity of the values of the variable of interest as a function of their separating distance (Gatrell 1979). This similarity decreases as the distance among sample points increases. Positive autocorrelation occurs when nearby observations are more alike than samples collected further away. Sparse sampling is less costly, but the variability of the variable of interest may go unnoticed. Consequently, not only the quantity of the samples is important, but also their locations.

1.2

One-Dimensional Sampling

Initial work on sampling was devoted to one-dimensional problems (see, e.g., Cochran 1946; Madow 1946, 1953; Madow and Madow 1949). Cochran documented the efficiency associated with random sampling, systematic sampling and stratified sampling. A random sampling scheme (Fig. 1a) allocates n sample points randomly within a population of interest. Each location has the same probability (is equally likely) to be selected. In a systematic random sampling (Fig. 1b), the population is partitioned into a pre-specified number of intervals. For each interval, a number of samples is collected, and the total of all samples is of size n. In a systematic sampling scheme (Fig. 1c), the population of interest is divided into

Spatial Sampling Fig. 1 One dimensional sampling schemes for n = 10. The x-axis is partitioned in ten intervals for cases (b) and (c). The random sampling locations have been generated using MATLAB rand function

1831

a x 0

1

2

3

4

5

6

7

8

9

10

Random

b

x 0

1

2

3

4

5

6

7

8

9

10

Stratified random

c

x 0

1

2

3

4

5

6

7

8

9

10

Systematic

n intervals of similar size. The first element is chosen within the first interval, starting at the origin, and the remaining n  1 elements are aligned according to the same, fixed interval. A discussion of these configurations to the field of natural resources can be found in Stevens and Olsen (2004).

1.3

Two-Dimensional Sampling

Necessary and common to both spatial and non-spatial sampling strategies are (i) the size of the sampling set, which is dictated by the budget at hand, (ii) the configuration of the sampling design, (iii) an estimator to characterize the population and (iv) an estimation of the sampling variance to compute confidence intervals. Das (1950) has documented the variation of the sampling variance of two-dimensional designs. A simple random sampling design (Fig. 2a) randomly selects a set of m sample points in a study region, generally denoted D, where each location has an equal opportunity to be sampled. In a systematic sampling design, (illustrations given in Fig. 2b, c, d, the study region is discretized into m intervals of equal size Δ. The first element is randomly or purposively chosen within the first interval, and so are other points in the remaining regions. If the first sample is chosen at random, the resulting scheme is called systematic random sampling. When the first sample point is not chosen at random, the resulting configuration is called regular systematic sampling. A centric systematic sampling occurs when the first point is chosen in the center of the first interval. The resulting scheme is a checkerboard configuration. The most common regular geometric configurations are the equilateral triangular grid, the rectangular (square) grid, and the hexagonal one (Cressie 1991). The benefits of a systematic approach reside in a good spreading of observations across D, guaranteeing a maximized sampling coverage, and preventing sampling clustering and redundancy. This design however presents two inconveniences:

1832

E. M. Delmelle

a

b

Random

Systematic random

c

Systematic centric

d

Systematic unaligned

Fig. 2 Two dimensional sampling schemes for n = 100. In figures (b), (c) and (d), both x- and y-axis have been divided into ten intervals. Points were randomly generated using Matlab rand function

(a) The distribution of separating distances in D is not represented well because many pairs of points are separated by the same distance, (b) If the spatial process shows evidence of recurrence, periodicities there is a risk that the variation of the variable will remain uncaptured, because the systematic design coincides in frequency with a regular pattern in the landscape (Overton and Stehman 1993). A systematic random method addresses the second concern since it combines both systematic and random procedures (Dalton et al. 1975). One observation is randomly selected within each cell. However, sample density needs to be high enough in order to document the strength of the spatial relationship (e.g. variogram) among observations. From Fig. 2b, some patches of D remain undersampled, while others regions show evidence of clustered observations. A systematic unaligned scheme prevents this problem from occurring by imposing a stronger restriction on the random allocation of observations. In stratified sampling (Delmelle 2009), the population (or D) is partitioned into non-overlapping strata. A set of samples is collected for each stratum, where the sum of the samples over all strata must equal m (strata may be of different size, for instance a census tract). The knowledge of the underlying process is a determining factor in defining the shape and size of each stratum. Smaller strata are preferred in non-homogeneous subregions. Evaluation of sampling strategies. Following Quenouille’s approach of a linear autocorrelation model, stratified random sampling is generally considered to yield a smaller variance than a systematic design. However, if the autocorrelation function is not linear (for instance exponential), systematic sampling is the most efficient technique, followed by stratified random sampling and random sampling. Overton and Stehman (1993) presented some numerical results illustrating the magnitude of the differences of the three aforementioned designs under various population models. When sampling a phenomenon characterized by a regular pattern in the landscape, a systematic unaligned configuration is preferred.

Spatial Sampling

2

1833

Geostatistical Sampling

An essential commonality of many natural phenomena is its spatial continuity in the geographical space. The field of geostatistics (Matheron 1963) provides a set of regression techniques to capitalize on this spatial continuity to mathematically summarize the spatial variation of the phenomenon and then use this information to predict the phenomenon under study at unsampled locations. Central to Geostatistics is Kriging, an interpolation technique that uses the semivariogram, a function which reflects the dissimilarity of pairs of points at different distance lags. The strength of this correlation determines the weighting scheme used to create a prediction surface at unsampled locations, while minimizing the estimation error. As the distance separating two sample points increases, their similarity decreases and the influence on the weighting scheme diminishes. Beyond a specific distance called the range where autocorrelation is very small, the semivariogram flattens out (see, e.g., Cressie 1991 for various summaries). Mathematical expression for Kriging A variable of interest Y is collected at m supports within a study region D. Using data values of the primary variable, an empirical semivariogram γ^ðhÞ summarizes the variance of values separated by a particular distance lag (h): γ^ðhÞ ¼

1 2d ðhÞ

X 

 2 yð s i Þ  y s j

(1)

jsi sj j¼h

where d(h) is the number of pairs of points for a given lag value, and y(si) the observation value at location si. The semivariogram is characterized by a nugget effect a and a sill σ 2 where γ^ðhÞ levels out. The nugget effect reflects the spatial dependence at micro scales, caused by measurement errors at distances smaller than sampling distances (Cressie 1991). Once the lag distance exceeds a certain value r, called the range, there is no spatial dependence between the sample sites anymore. The semivariogram function γ^ðhÞ becomes constant at a value called the sill σ 2. A model γ(h) is fitted to the experimental variogram, for instance an exponential model.   3h γ ð hÞ ¼ σ 2 1  e r :

(2)

In the presence of a nugget effect a, Eq. (2) becomes:    3h γ ð hÞ ¼ a þ σ 2  a 1  e r :

(3)

Equation (4) denotes the corresponding covariogram C(h), that summarizes the covariance between any two points. C ðhÞ ¼ C ð0Þ  γ ðhÞ ¼ σ 2  γðhÞ:

(4)

1834

E. M. Delmelle

The interpolated, kriged value at a location s in space is given as a weighted mean of surrounding values, where each value is weighted according to the variogram model: y^ðsÞ ¼

W X

wi ðsÞyðsi Þ

(5)

i¼1

where W is the set of neighboring points that are used to estimate the interpolated value at location s, and wi(s) is the weight associated with each surrounding point, which is a function of the semivariogram function. The weight of each sample can be determined by an exponential function (Eq. 2). Usually, kriging is performed on a set of grid nodes sg(g = 1, 2, . . . , G). Kriging yields an associated kriging variance that measures the prediction uncertainty. The kriging variance at a location s is given by: σ 2k ðsÞ ¼ σ 2  cT ðsÞC1 cðsÞ

(6)

where cT is the transpose of the covariance matrix C based on the covariogram function and C1 its inverse. The overall kriging variance σ 2k is obtained by integrating Eq. (6) over the region D: Computationally, it is easier to perform a spatial discretization of D and sum the kriging variance over all grid points sg. ð

  1 X 2  σ 2k sg  σ sg G gϵG k D

(7)

The kriging variance can be calculated with an estimated variogram and the known location of existing sampling points. The kriging variance solely depends on the spatial dependence and configuration of the observations (Cressie 1991). Figure 3 summarizes the variation in the kriging variance estimate for the four designs of Fig. 2. Van Groenigen et al. (1998, 1999) suggest that initial sampling schemes should be optimized for a reliable estimation of the variogram function, which can either be used

a

b

Random

c

Stratified random

d

Systematic centric

Systematic unaligned

Fig. 3 Kriging variance associated with the two dimensional sampling schemes of Fig. 3

Spatial Sampling

1835

for the prediction of the variable under study or to help designing additional sampling phase(s). For the former, two strategies have been suggested in the literature: (a) A geometric coverage of sample points over the study region is generally desirable to guarantee enough pairs of points at different distances. (b) Points need to be distributed in the multivariate field to capture as much variation as possible. Moreover, optimal sampling strategies exist to reduce the kriging variance associated with the interpolation process. The next paragraphs illustrate three common objectives in spatial sampling: variogram estimation, minimization of the kriging variance and sampling in a multivariate field.

2.1

Designs for Variogram Estimation

Traditional ways to evaluate the goodness of a sampling scheme do not incorporate the spatial structure of the variable. The increasing use of geostatistics as a leastsquares interpolation technique, however, has fostered research on optimizing sampling configuration to maximize the amount of information obtained during a first sampling phase. Matérn (1960) and Yfantis et al. (1987) have suggested that the use of an equilateral triangular sampling grid (Fig. 4) can yield to a very reliable estimation of the variogram and predict the mean over a study region, assuming radially symmetric, decreasing covariances. Systematic designs (Fig. 2c, d) offer the advantage of good coverage of observations, capturing the main features of the variogram (Van Groenigen et al. 1999). It may be necessary to strategically design a sampling scheme where a subset of the observations are evenly spread across the study area and the remaining points clustered together to capture the autocorrelation function at very small distances (Delmelle 2009). When the variogram has no nugget effect, the benefits of the optimization procedure are somewhat restrained. In the presence of a nugget effect,

a

b

c

y

y

y

x

Square

x

Hexagonal

Fig. 4 Three common geometric sampling schemes

x

Triangular

1836

E. M. Delmelle

a random sampling configuration will score poorly, because of the limited information offered by random sampling for small distances. The reliability of the variogram function depends on the number of pairs of points within each distance band. Russo (1984) and Warrick and Myers (1987) have proposed some strategies to reproduce an a priori defined ideal distribution of pairs of points, based on a given variogram function. The procedure allows to account for the variation in distance and direction (anisotropy). Corsten and Stein (1994) use a nested sampling design for a better estimation of the nugget effect. A nested sampling design consists of taking observations according to a hierarchical scheme, with decreasing distances between observations. This type of sampling scheme distributes a high number of observations in some parts of the area, and a low observation density in other regions. This in turn generates only a few distances for which variogram values are available. Taking into account a prior information of the spatial structure of the variable and assuming a stationary variable, Van Groenigen and Stein (1998) have combined two different objectives to allocate samples during an initial phase. The first objective called the Warrick/Myers criterion ensures optimal estimation of the covariogram, and aims at redistributing pairs of points over the distance- and direction-lags according to a pre-specified distribution. The second criterion, called Minimization of the Mean of the Shortest Distances (MMSD) requires all sampling points spread evenly to ensure that unsampled locations are never far from a sampling point. The second criterion suggested by the authors of deterministic nature, resulting an even spreading pairs of points across the study area, which is similar in nature to a systematic pattern.

2.2

Optimal Designs to Minimize the Kriging Variance

The kriging procedure generates a minimum-error estimate of the variable of interest. This uncertainty is minimal or zero when there is no nugget effect at existing sampling points and increases with the distance to the nearest samples. One approach suggested in the literature is to design a sampling configuration to minimize this uncertainty. Since continuous sampling is not feasible, seeking the best sampling procedure must be carried out on a discretized grid. Using an a priori variogram model (Eq. 4), it is possible to design an initial sampling scheme S to minimize the overall kriging variance (Eq. 8), or the maximum kriging variance (Eq. 9). MINIMIZE |fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl} ½s1 ,..., sm  MINIMIZE |fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl} ½s1 ,..., sm 

 1 X 2 σ k sg ;S G gϵG

(8)

   1 sup σ 2k sg ;S G |{z}

(9)

J ðS Þ ¼

J ðS Þ ¼

gϵG

Burgess et al. (1981) estimate kriging variances for different scenarios of sampling densities, nugget effects and size of study regions. The strategy attempts to

Spatial Sampling

1837

identify the minimum number of samples necessary to reach a certain level of kriging variance. General findings are the increase of the prediction error as the nugget effect increases, or when the sampling density is reduced. An equilateral triangular configuration of sampling points (Fig. 4c) is best under isotropic conditions, but a square grid at the same density is nearly as good, and is preferred for data collection convenience. An equilateral triangle design will keep the variance to a minimum, because it reduces the farthest distance from initial sample points to non-sample points. A square grid performs well, especially in case of isotropy (McBratney and Webster 1981; McBratney et al. 1981). When directional discontinuities are present, a square grid pattern is preferred to an hexagonal arrangement (Olea 1984; Yfantis et al. 1987).

2.3

Sampling in a Multivariate Context

McBratney and Webster (1983) have discussed the importance of spatial sampling to multivariate fields. Sample data can be very difficult to collect, and very expensive, especially when monitoring air or soil pollution for instance (Haining 1990). Secondary data can be a valuable asset if they are available continuously over a study area and combined within the primary variable (Hengl et al. 2003). Secondary spatial data sources can include maps, digital elevation models, national, socioeconomic and demographic census data. Cross-variograms express the spatial relationships among the different variables. In turn, this information is capitalized to calibrate the parameters of the kriging equations. When the variogram of the primary variable and the cross-variograms are known a priori, an improved sampling configuration can be obtained. As rule of thumb consists of locating the sampling of the main variable where covariates exhibit substantial spatial variation (Delmelle and Goovaerts 2009). Secondary variables should be used to reduce the sampling effort in areas where their local contribution in predicting the primary variable is maximum (Delmelle 2009). If a set of covariates predicts accurately the data value where no initial sample has been collected yet, there is little incentive to perform sampling at that location. On the other hand, when covariates perform poorly in estimating the primary variable, additional samples are necessary. Another example of interest is the following: suppose that we are interested in sampling a primary variable at a total of p + n = o locations, to learn more on its variation across space, but we also would like to document its correlation with secondary variables (for instance, the magnitude of the primary variable is known to vary with covariates). In geography, these secondary variables could be census variables for instance. This approach has been documented in Whiteman et al. (2018). In that case, m initial samples can be allocated using the Warrick/Myers criterion guaranteeing wide spatial coverage. Using an optimization solver, it is also possible to locate the initial samples using a p-dispersion model, which maximizes the distance that separates any two sites:

1838

E. M. Delmelle

MAXIMIZE D

SUBJECT TO :

    D þ M  d ij X i þ M  d ij X j  2M  d ij X

Xi ¼ p

(10) 8i,j > i

(11) (12)

i

X i  f0,1g 8i

(13)

with Xi a decision variable equal to one when we sample at i, and zero otherwise (these potential sampling locations can be located on a grid, for instance). The term dij is the distance that separates two sampling locations i and j and can be calculated prior to optimization. Equation (10) maximizes the distance D between the closest pair of samples i and j. Constraint (11) tracks the distance between two sample sites, when both are selected (If either i or j has not been selected, then D is forced to be less than or equal to dij + M, where M is a very large number). Constraint (12) stipulates that the total number of sites must be equal to p. Constraints (13) are binary integer constraints. The remaining n samples are then located in areas that are deemed important (following a coverage model), and representative of the gradient of secondary variables (k variables). We constrain the model so that in the final set of o samples, there would be at least r samples in each q-quantiles of each secondary variable, and so on. The coverage approach model is formulated as follows: MAXIMIZE

X

wj Y j

SUBJECT TO :

jJ

X

Xi  Yj

8j  J

(14)

(15)

i  Nj

X

Xi ¼ n

(16)

i

X

Xi  r

8q  Q, k  K

(17)

X i  f0,1g

8i  I

(18)

Y j  f0,1g

8j  J

(19)

i

where Yj is a decision variable equal to one when an area deemed important of sampling is covered by a sampling site i. Whether a sampling site is selected is unknown, hence similar to the p-dispersion model, Xi needs to be determined

Spatial Sampling

1839

through the optimization method. The model maximizes the importance of location j, through a weight wj, and as such samples will be located in regions of higher importance (in the case of contamination, this could be in populated areas). Constraint (15) stipulates that an area j is covered only when there is at least one sample i in the vicinity of j. The latter is defined by imposing a radius around sampling unit i and computing the set of areas Nj potentially covered by sampling unit i. Constraint (16) restricts the number of sample sites to be equal to n. Constraint (17) stipulates that a minimum number of sample locations r should be selected for each quantile q of each of the variables k. Finally, constraints (18) and (19) are standard integrality constraints.

3

Second-Phase Sampling

Second-phase spatial sampling is defined as the addition of new observations to improve the overall prediction of the variable of interest. A set M of m initial measurements has been collected, and a variogram that summarizes the spatial structure of the variable of interest will help to determine the location and size for an additional set and location of their samples. It is generally agreed in the literature that the objective function aims to collect new samples to reduce the prediction error (kriging variance) by as much as possible. Mathematical expression for minimizing of the kriging variance in a second phase. We add a set of n new sample points to the initial sample set of size m. Using the variogram function from the first sample set, the change in kriging variance Δσ 2k is over all grid points sg: Δσ 2k

" # 1 X old   X new   ¼ σ sg  σ k sg G gϵG k gϵG

(20)

where σ old k is the mean kriging variance calculated with the set of [m] initial sample points and σ new is the mean kriging variance with the [m + n] additional set of points. k From Eq. (20):       σ old sg ¼ σ 2  c sg  C1  cT sg k |{z} |ffl{zffl} |fflfflffl{zfflfflffl} ½m ½1,m ½m,1       σ new sg ¼ σ 2  c sg  C1  cT sg : k |ffl{zffl} |{z} |fflfflffl{zfflfflffl} ½mþn ½1, mþn ½mþn,1

(21)

(22)

The objective function helps to locate the set of additional n points that will maximize this change in kriging variance (Christakos and Olea 1992; Van Groenigen et al. 1999; Rogerson et al. 2004). The n additional points are to be chosen from a set of size (N  m), i.e. all possible sample sites in D except the m ones selected during

1840

E. M. Delmelle



N m possible combinations and n it is almost impossible to find the optimal using combinatorics. The objective function objective function is formulated as: the first sampling phase. In that case, there are

MAXIMIZE |fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl} ½smþ1 ,..., smþn 

J ðS Þ ¼

 1 X 2 σ k sg ;S G gϵG

(23)

Incorporating secondary information in a second sampling phase. New samples can be collected in areas where secondary variables do not provide good estimates of the primary variable. Consider the situation where the primary data is supplemented by k additional secondary variables Xi(8i = 1, . . . , k) available at G grid nodes sg(g = 1, 2, . . . , G). Local regression techniques such as geographically weighted regression (GWR) (Brunsdon et al. 1996; see also the chapter ▶ “Geographically Weighted Regression” by Wheeler in this handbook) provide locally linear regression estimates at every point i, using distance weighted samples. Our goal is to sample in those areas characterized by low local r2, since it is in those areas that covariates are not performing well in predicting the outcome of the primary variable. A local r2 can be conceived as how well covariates predict the main variable locally, for instance from a GWR model. Formulating the second-phase sampling problem. This approach is proposed by Cressie and has been applied by Rogerson et al. (2004) and Delmelle and Goovaerts (2009) to weight the kriging variance by suitable weighting function w(.), where the importance of a location to be sampled is represented by a weight w (s), that is location-specific. MAXIMIZE |fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl} ½smþ1 ,..., smþn 

J ðS Þ ¼

 1 X   2 w sg Δσ k sg ;S G gϵG

(24)

The weight should reflect the importance provided locally by covariates, but could also account for the rapid change in spatial structure of the primary variable at sg (Delmelle and Goovaerts 2009).

4

Numerical Example

A numerical example is provided to gain insight into the sampling problem structure. The goal is to maximize the change in the weighted kriging variance. As an hypothetical example, we simulated a synthetic snowfall data in a 10  10 kilometers bounding box. Minimizing the kriging variance Fig. 5 displays the initial set of 50 measurements and the associated interpolated map, based on an exponential semivariogram with a range of 3,000 meters, a nugget effect C(0) = 0 and sill a = .025. The amount of snowfall is simulated to be minimal in the upper northwestern corner and

Spatial Sampling

4.722

x 106

1841 6

Initial data points

x 10

1

4.721 4.72

0.9

0.9

4.721

0.8

0.8

4.7205

0.7

4.72

0.5

4.718 4.717 4.716

0.7 0.6

4.7195

y

y

0.6 4.719

Interpolated snowfall values

4.7215

0.5

4.719

0.4

4.7185

0.3

4.718

0.4 0.3

0.2

4.7175

0.2

0.1

4.717

0.1

x

6. 67 6. 67 5 6. 68 6. 68 5 6. 6 6. 9 69 5 6. 6. 7 70 5 6. 71 6. 71 5 6. 72

6. 67 6. 67 5 6. 68 6. 68 5 6. 6 6. 9 69 5 6. 6. 7 70 5 6. 71 6. 71 5 6. 72

0 5

x 10

x

5

x 10

Fig. 5 Dark colors denote regions characterized by heavier amounts of snow. Units are in feet

x 10

6

−3

Kriging variance

x 10

4.7215

x 10

4.721

14

10

6

4.718

4

4.7175

2

67 5 6. 68 6. 68 5 6. 69 6. 69 5 6. 7 6. 70 5 6. 71 6. 71 5 6. 72

6.

67

4.717

x

10 8

4.719 4.7185

6

4.718

4

4.7175

2

4.717 67 67 5 6. 6 6. 8 68 5 6. 69 6. 69 5 6. 7 6. 70 5 6. 71 6. 71 5 6. 72

4.719 4.7185

4.72 4.7195

6.

8

y

4.7195

4.7205

12

6.

4.72

6.

x 10−3

Location of the grid points

4.721 12

4.7205

y

6

4.7215

14

x 105

x

x 105

Fig. 6 The prediction error on the left and the location of gridnodes sg

increases steadily southeastwards. Figure 6 on the left is an interpolated contour map of the prediction error. The variance increases away from the existing data points, to reach maximum values in the corner of the study area. The right figure displays the discretized study area, generating a set P = 51  51 potential points. If the goal is to maximize the change in kriging variance only (Eq. 23), the location of the new points would be far away from existing ones. Weights to reflect sampling priorities An example of sampling weights is given in Fig. 7 on the left and is multiplied by the kriging variance on the right. As a result of this multiplication, some locations exhibiting high weight for second-phase sampling (where we observe a stronger variation in the spatial structure of the primary variable) may not be recommended for further sampling since they are located in the close vicinity of an existing initial sampling point. For instance, on the left figure, the location [6, 705, 000; 4, 718, 500] is characterized by a high sampling weight. However, the region has already been sampled initially and consequently the likelihood for second-phase sampling decreases. Consider that we have the intention to add one sample point that would optimize Eq. (24). To find the point that would

1842

E. M. Delmelle x 106

Weights 4.7215

4.721

0.35

4.721

4.7205

0.3

4.7205

4.72

9 8 7

4.72

0.25

6

4.7195

4.7185

0.15

4.718

0.1

4.7175 0.05

6. 67 6. 67 5 6. 68 6. 68 5 6. 69 6. 69 5 6. 6. 7 70 5 6. 71 6. 71 5 6. 72

4.717

x

x 105

4.719

5

4.7185

4

4.718

3

4.7175

2

4.717

1

6. 67 6. 67 5 6. 6 6. 8 68 5 6. 69 6. 69 5 6. 6. 7 70 5 6. 71 6. 71 5 6. 72

0.2

4.719

y

4.7195

y

x 10−4

x 106 Weighted kriging variance

4.7215

x

x 10

5

Fig. 7 Combined objectives 2 and 3 (on the left), multiplied with the kriging variance (on the right)

maximize the change in kriging variance, summed over all grid points, an iterative procedure is necessary which evaluates the score of each candidate sample point to the objective function. Such an enumeration can be very time-consuming, and for computational purposes, it is less demanding to select a point where the weighted kriging variance value is maximum. From Fig. 7, the location of the point exhibiting the maximum weighted kriging variance was [670, 600; 4, 718, 200]. Once an optimal (or near optimal) point has been added, it is possible to recompute the objective function. It is also desirable to adapt the constraints as the iteration continues.

5

Search Strategies

The set of candidate sampling locations may be large, and it is often desirable to rely on heuristic techniques to return an acceptable solution in a limited time frame. A heuristic guides the search towards a sample set S that is optimal (or near optimal) to a predefined objective function, for instance the set S is optimal to the objective function J defined in Eq. (24). The efficiency of a heuristic depends on its capacity to give as often as possible a solution close to S. In second-phase sampling, a heuristic that would select m points at random would not return a very good value for J. Examples of search techniques include the greedy algorithm, simulated annealing, tabu search and genetic algorithms among others (Christakos and Olea 1992; Delmelle and Goovaerts 2009). The greedy algorithm builds a solution sequentially only accepting improving moves, while the other methods improve the value of the objective function by iterations starting from an initial solution s0, but also accepting non-improving moves. Specifically, the three first methods usually remain stuck in a local optimum while the last three also called metaheuristics may find the optimal solution s. Greedy leads to a unique solution S þ 0 and does not explore the entire set of candidate samples. Simulated annealing authorizes to occasionally decrease the

Spatial Sampling

1843

objective function (in our case to be maximized) in order to continue exploring for better solutions. Note that the simulated annealing algorithm does not always converge. Both tabu search and genetic algorithm techniques have not been applied to sampling optimization. They both lead to an optimal solution, yet the tabu search algorithm tends to cycle.

6

Conclusion

Accurate and effective spatial sampling strategies are very important when researchers are limited by their available budget (or time). A careful design is crucial to identify the main features of the phenomenon under study, and avoid that its spatial characteristics are remaining unnoticed. For instance, incorporating some randomness in a systematic sampling design may be useful to document patterns with periodicities. Once initial samplings have been collected, a variogram can be built, which can ultimately help designing a second phase sampling survey (away from existing samples and where the variation is maximum). When the set of candidate locations is large and the objective non-linear, heuristic methods may be necessary to find a set optimal to some sampling criteria. The methods illustrated in this chapter may easily be extended to areal data (for instance census tracts or socio-economic strata). Some areas may be deemed more important for sampling, and the proposed objectives are flexible to reflect sampling priorities.

7

Cross-References

▶ Exploratory Spatial Data Analysis ▶ Geographically Weighted Regression ▶ Geostatistical Models and Spatial Interpolation ▶ Spatial Data and Spatial Statistics

References Brunsdon C, Fotheringham AS, Charlton ME (1996) Geographically weighted regression: a method for exploring spatial nonstationarity. Geogr Anal 28(4):281–298 Burgess TM, Webster R, McBratney AB (1981) Optimal interpolation and isarithmic mapping of soil properties: IV. Sampling strategy. J Soil Sci 32(4):643–659 Christakos G, Olea RA (1992) Sampling design for spatially distributed hydrogeologic and environmental processes. Adv Water Resour 15(4):219–237 Cochran WG (1946) Relative accuracy of systematic and stratified random samples for a certain class of populations. Ann Math Stat 17(2):164–177 Corsten LCA, Stein A (1994) Nested sampling for estimating spatial semivariograms compared to other designs. Appl Stochastic Models Data Anal 10(2):103–122 Cressie N (1991) Statistics for spatial data. Wiley, New York Dalton R, Garlick J, Minshull R, Robinson A (1975) Sampling techniques in geography. Goerges Philip and Son Limited, London

1844

E. M. Delmelle

Das AC (1950) Two-dimensional systematic sampling and the associated stratified and random sampling. Sankhyā 10(1):95–108 Delmelle E (2009) Spatial sampling. In: Rogerson P, Fotheringham S (eds) The SAGE handbook of spatial analysis. Sage, London, pp 54–71 Delmelle E, Goovaerts P (2009) Second-phase spatial sampling designs for non-stationary spatial variables. Geoderma 153(1–2):205–216 Gatrell AC (1979) Autocorrelation in spaces. Environ Plan A 11(5):507–516 Griffith DA (2005) Effective geographic sample size in the presence of spatial autocorrelation. Ann Assoc Am Geogr 95(4):740–760 Haining RP (1990) Sampling spatial populations. In: Haining RP (ed) Spatial data analysis in the social and environmental sciences. Cambridge University Press, Cambridge, pp 171–196 Hengl T, Rossiter DG, Stein A (2003) Soil sampling strategies for spatial prediction by correlation with auxiliary maps. Aust J Soil Res 41(8):1403–1422 Madow LH (1946) Systematic sampling and its relation to other sampling designs. J Am Stat Assoc 41(234):204–217 Madow WG (1953) On the theory of systematic sampling. III. Comparison of centered and random start systematic sampling. Ann Math Stat 24(1):101–106 Madow WG, Madow LH (1949) On the theory of systematic sampling. I. Ann Math Stat 15(1):1–24 Matérn B (1960) Spatial variation. Springer, Berlin/Heidelberg/New York Matheron G (1963) Principles of geostatistics. Econ Geol 58(8):1246–1266 McBratney AB, Webster R (1981) The design of optimal sampling schemes for local estimation and mapping of regionalized variables: II. Program and examples. Comput Geosci 7(4):331–334 McBratney AB, Webster R (1983) Optimal interpolation and isarithmic mapping of soil properties: V. Co-regionalization and multiple sampling strategy. J Soil Sci 34(1):137–162 McBratney AB, Webster R, Burgess TM (1981) The design of optimal sampling schemes for local estimation and mapping of regionalized variables: I. theory and method. Comput Geosci 7(4):335–365 Müller W (1998) Collecting spatial data: optimal Design of Experiments for random fields. Physica-Verlag, Heidelberg Olea RA (1984) Sampling design optimization for spatial functions. Math Geol 16(4):369–392 Overton WS, Stehman SV (1993) Properties of designs for sampling continuous spatial resources from a triangular grid. Commun Stat 22(9):2641–2660 Rogerson PA, Delmelle EM, Batta R, Akella MR, Blatt A, Wilson G (2004) Optimal sampling design for variables with varying spatial importance. Geogr Anal 36(2):177–194 Russo D (1984) Design of an optimal sampling network for estimating the variogram. Soil Sci Soc Am J 48(4):708–716 Stevens D, Olsen A (2004) Spatially balanced sampling of natural resources. J Am Stat Assoc 99 (465):262–278 Thompson SK (2002) Sampling, 2nd edn. Wiley, New York Van Groenigen JW, Stein A (1998) Constrained optimization of spatial sampling using continuous simulated annealing. J Environ Qual 27(5):1078–1086 Van Groenigen JW, Siderius W, Stein A (1999) Constrained optimisation of soil sampling for minimisation of the kriging variance. Geoderma 87(3–4):239–259 Warrick AW, Myers DE (1987) Optimization of sampling locations for variogram calculations. Water Resour Res 23(3):496–500 Yfantis EA, Flatman GT, Behar JV (1987) Efficiency of kriging estimation for square, triangular and hexagonal grids. Math Geol 19(3):183–205

Exploratory Spatial Data Analysis Jürgen Symanzik

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Types of Spatial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Basic Visualization and Exploration Techniques via Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Choropleth Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Linked Micromap Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Conditioned Choropleth Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 ESDA via Linking and Brushing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Local Indicators of Spatial Association (LISA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Software for ESDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 ESDA and GIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Stand-Alone Software for ESDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1846 1847 1848 1849 1849 1852 1852 1856 1856 1856 1857 1858 1859 1859

Abstract

In this chapter, we discuss key concepts for exploratory spatial data analysis (ESDA). We start with its close relationship to exploratory data analysis (EDA) and introduce different types of spatial data. Then, we discuss how to explore spatial data via different types of maps and via linking and brushing. A key technique for ESDA is local indicators of spatial association (LISA). ESDA needs to be supported by software. We discuss two main lines of software developments: GIS-based solutions and stand-alone solutions.

J. Symanzik (*) Department of Mathematics and Statistics, Utah State University, Logan, UT, USA e-mail: [email protected]; [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_76

1845

1846

J. Symanzik

Keywords

Geographic information system · Spatial data · Exploratory data analysis · Spatial point pattern · Modifiable areal unit problem

1

Introduction

In his groundbreaking book from 1977 on exploratory data analysis (EDA), Tukey (1977) made several statements that are still relevant today, more than 40 years after the publication of this book: “The greatest value of a picture is when it forces us to notice what we never expected to see.” (p. vi) “Today, exploratory and confirmatory can — and should — proceed side by side.” (p. vii) “Exploratory data analysis is detective work — numerical detective work — or counting detective work — or graphical detective work.” (p. 1) “Unless exploratory data analysis uncovers indications, usually quantitative ones, there is likely to be nothing for confirmatory data analysis to consider.” (p. 3) “Exploratory data analysis can never be the whole story, but nothing else can serve as the foundation stone — as the first step.” (p. 3) Tukey’s expectations (and limitations) on EDA can easily be extended to exploratory spatial data analysis (ESDA), that is, the exploratory analysis of data with a spatial (geographic) component. As early as 1981, Ripley (1981) followed the distinction between exploratory and confirmatory data analyses in the preface of his book on spatial statistics: “The techniques presented are designed for both of John Tukey’s divisions of exploratory and confirmatory data analysis” (p. vi). The exact definition of ESDA slightly differs from source to source, but all agree that ESDA is an obvious extension of EDA. Commonly found topics that are covered by ESDA include the visualization and exploration of data in a spatial (geographic) framework. ESDA utilizes many methods, tools, and software components from the field of interactive and dynamic statistical graphics, such as brushing and linked views/linked windows. Typically, one or more map views are linked with one or more statistical displays of the data. Modifications to one of the views will result in modifications of all linked views. Recent uses of ESDA in the health sciences and in the social sciences can be found in Osiecki et al. (2013) and Wilson and Greenlee (2016), respectively. Questions of interest that ESDA can answer may be whether a cluster of points that can be seen in a scatterplot is related to nearby spatial locations or whether a particular geographic region (say, the coastal region of a country) exhibits different characteristics than the mountainous region that can be seen in a linked statistical view. Moreover, ESDA can help to create new hypotheses about the underlying spatial data that can later be investigated in more detail in a follow-up study. Also,

Exploratory Spatial Data Analysis

1847

ESDA methods should be applied before any advanced modeling and testing of statistical hypotheses. Anscombe (1973) has provided some striking examples which could happen when a linear regression line is blindly fitted to some unsuitable data set. The same is the case if some methods from spatial statistics are blindly applied to some spatial data set when no prior exploration took place. We should keep in mind that spatial data often are large and diverse data sets, rather than homogeneous data sets. A large number of different methods usually could, and should, be used, including, simple numerical summary statistics. Coming back to Tukey, the goal or expected outcome of the exploration usually is unknown in advance. Moreover, it should be noted that ESDA is more than just an extension of EDA as additional techniques and methods are needed that incorporate the specific spatial structure of the data. A frequent goal of ESDA is the exploration of spatial autocorrelation. We speak of positive spatial autocorrelation when nearby observations on average are more similar than what a random assignment would yield and of negative spatial autocorrelation when nearby observations on average are more distinct than what a random assignment would yield. In the next section, we will discuss the main types of spatial data and following that, consider basic visualization and exploration techniques via maps. We then discuss two of the key concepts of ESDA: exploration via linking and brushing and local indicators of spatial association (LISA). A section on software for ESDA, including recently developed R packages, follows. We then finish with a brief conclusion and outlook on possible future work.

2

Types of Spatial Data

There exist four main types of spatial data: a. In spatial point patterns, the location of an event is of interest itself. Point patterns can be the locations where a patient died from a particular disease or where some specific animal species have been observed. A question of interest might be to explore the spatial patterns of the deaths, for example, at which locations deaths have been due to disease A and at which locations deaths have been due to disease B. b. Lattice data, sometimes also called area, areal, or grid data, are data that have been aggregated over some small geographic area. Often, a distinction is made between regular lattices (such as encountered for remote sensing data) or irregular lattices (such as states, counties, or health service areas). In a scenario where different economic regions are compared, a question of interest might be to explore how variables such as educational level, age, and racial composition of the population relate to unemployment in that region. c. Geostatistical data, sometimes also called spatially continuous data, are data that could, at least theoretically, be observed at any spatial location. However, cost and time determine at how many locations such data actually are collected. Examples

1848

J. Symanzik

of this type of data range from precipitation and temperature measurements to air pollution measurements and readings of minerals in the earth. A question of interest might be to visualize the distribution of nitrates in the soil in a specific region before fitting a smooth surface to the data. d. Origin–destination flow data, sometimes also called link or spatial interaction data, are data that consist of measurements, each of which is associated with a pair of point locations or a pair of areas. Examples for this type of data are home address and workplace address for inhabitants of a particular city or originating and destination airports for airline travel. A question of interest might be to explore from which originating airports most passengers, most flights, or most cargo arrives at a particular destination airport. Cressie (1993) addressed the first three types of spatial data, both from a theoretical as well as from an applied perspective. When an additional temporal component is available, that is, spatial data are collected over time; we speak of spatiotemporal data. Origin–destination flow data are discussed in detail in Fischer and Wang (2011, Part II). To a considerable extent, the underlying type of spatial data set determines which ESDA techniques are most suitable. Many of the ESDA techniques discussed in this chapter are suitable for more than one type of spatial data.

3

Basic Visualization and Exploration Techniques via Maps

The first step to explore spatial data often is to display the data on a map and then to produce several variations of the initial map. Credit needs to be given to John Snow (1813–1858), a British anesthesiologist, who was the first who mapped disease data. His investigation of the 1854 cholera outbreak in London pioneered the field of epidemiology. Nowadays, some consider him the “father of epidemiology” but the name “grandfather of ESDA” might suit him equally well. The 1854 London cholera outbreak started on August 19, 1854. It lasted about 6 weeks and resulted in more than 575 deaths. Snow (1936) observed: “. . . Mortality in this limited area probably equals any that was ever caused in this country, even by the plague.” Snow’s hypotheses were that cholera was transmitted from person to person via a fecal–oral route and that the drinking water of the Broad Street pump was the cause of the cholera outbreak. Snow utilized his map and empirical evidence to convince the Board of Guardians to remove the handle of the Broad Street pump. A mere 48 fatal attacks occurred, following the removal of the handle of the Broad Street pump, indicative that the water feeding the Broad Street pump could indeed be the source of the cholera epidemic. As demonstrated by Snow, the visualization of spatial locations, that is, spatial point patterns, can provide valuable insights into such a data set. Moreover, if additional information is available for the locations such as age, gender, and case/ control, this information can be displayed via different colors, symbols, and symbol sizes in the map display.

Exploratory Spatial Data Analysis

3.1

1849

Choropleth Maps

For lattice data and geospatial data, several types of map displays exist and can be used for exploration. Best known, and most widely used, are choropleth maps. However, choropleth maps highly depend on choices made by the map creator. Even if the geographic boundaries are fixed as is the case for lattice data, Monmonier (1996, Chaps. 4 and 10) worked out different visual effects depending on whether the data are split into equal interval classes or into quartile (or other quantile) classes. The same choices that affect histograms, that is, the starting point of a class interval and the width of each class interval also affect choropleth maps. Moreover, color choices in choropleth maps have a considerable effect on our perception. A small dark area in an overall bright map may (or may not) be perceived as well as a small white area in an overall dark map. Excellent options for color choices for maps and statistical plots can be obtained from the ColorBrewer software tool (Harrower and Brewer 2003), accessible at http://colorbrewer2.org. Finally, if geographic boundaries are not fixed in advance, choropleth maps can be easily affected by the modifiable areal unit problem (MAUP) (Openshaw 1984). See the chapter ▶ “Scale, Aggregation, and the Modifiable Areal Unit Problem” by Manley in this handbook for further details. Depending on the boundaries that are used for the aggregation (such as summation or averaging), rather different results for the sums, percentages, or averages may be obtained. Monmonier (1996, Chap. 10) demonstrated the MAUP for the locations of Snow’s cholera data set. Therefore, it is necessary to explore what happens when spatial data get aggregated in different ways. Given these different sources for biases when looking at choropleth maps, it is necessary to create and explore a variety of these maps to explore and understand the underlying spatial patterns. A single choropleth map that is the result of some default setting in a map-producing software package rarely will reveal all details of the underlying spatial data set. Andrienko et al. (2001) discussed how to conduct an exploratory analysis of spatial data via a combination of interactive maps and data mining. Brunsdon and Comber (2019) provided an overview on spatial statistical analyses and mapping using R (R Core Team 2018).

3.2

Linked Micromap Plots

Linked micromap (LM) plots (Symanzik and Carr 2008; Carr and Pickle 2010) were introduced in 1996 as an alternative to choropleth maps, especially to overcome some of the limitations of choropleth maps. The basic idea behind LM plots is to link geographic region names and their statistical values with their locations that are shown in a sequence of small maps, called micromaps. A typical LM plot (see Fig. 1) consists of three to five columns. The first column usually shows the maps, the second column lists some identifier (such as country or state names), and the third to the fifth columns contain statistical plots. Each small map highlights a few locations, typically five in a single map. The data are sorted according to some statistical

1850

J. Symanzik White Male Lung Cancer Mortality Rates

Order: 1950:1969 Rates States Are Found Further From The Median

DC AK

HI

U.S. States

1950:1969 − 1970:1994 Rates 20 40 60 80

1970:1994 Rates and 95% CI 40 50 60 70 80 90 0

1970:1994 Rates County Boxplots 50 100 150

LA MD NJ DC FL HI NY DE RI AK NV CT CA AZ NH IL MA MI VA TX OH GA SC PA ME

Median

VT MO MS IN AL WV

LA

OK WA OR TN AR MT KY NC KS IA CO NE WI WY NM MN SD ID UT ND 20

40

60

80

Deaths per 100,000

40 50 60 70 80 90 0 Deaths per 100,000

50

100

150

Deaths per 100,000

U.S. Value

Fig. 1 LM plots, based on data from the NCI Web page, showing summary values for white male lung cancer mortality rates in the United States for the years 1950–1969 and for the years 1970–1994 in the left data panel, rates and 95% confidence intervals in the middle data panel, and box plots for each of the counties of each state in the right data panel. (Previously published as Fig. 1.6 in Symanzik and Carr (2008, p. 285). Reprinted by permission from Springer Nature, Copyright 2008)

Exploratory Spatial Data Analysis

1851

criteria, for example, from highest to lowest (or vice versa), or from highest increase to lowest increase between years 1 and 2. Thus, the topmost map shows locations with the five largest (or smallest) observations according to the sorting criteria, the next map shows the five locations with the next largest (or smallest) observations, and so on. In case of any spatial association, locations with high or low observations tend to be plotted on the same map or on neighboring maps. The columns with the statistical plots may contain dot plots for each location, confidence intervals, time series plots, or box plots that are based on data for each particular location. Micromaps have been used in print for applications as diverse as for comparisons of changing population density and population growth by state and for the visualization and interpretation of birth defects data in Utah and the United States. Typically, a published micromap is the result of many iterations where the authors experimented with different sortings and arrangements of the data panels and multiple possible layouts of the map panel. While LM plots initially were constructed only for a static representation of the underlying data on paper or a computer screen, interactive versions have been introduced to allow an exploration of the underlying data from multiple perspectives. The US Department of Agriculture (USDA) – National Agricultural Statistics Service (NASS) Research and Development Division released an interactive micromap Web site (previously accessible at http://www.nass.usda.gov/research/sumpant.htm) in September 1999 for the display of data from the 1997 Census of Agriculture. This Web site was accessible for almost two decades. The National Cancer Institute (NCI) released an interactive micromap Web site (https://statecancerprofiles.cancer.gov/) in April 2003 for accessing their cancer data (Wang et al. 2002). However, the pages that allowed the visualization via linked micromap plots have been taken offline around 2015. An experimental micromap application (developed by Marcus Beck in 2016) that is based on the shiny R package (http://shiny.rstudio.com/) can be accessed at https://beckmw.shinyapps.io/micromap_app/. While printed (static) LM plots are most suitable when the number of geographic regions ranges from about 10 to about 100, interactive LM plots may be suitable for several hundred geographic regions. Micromaps at the county level for the 254 counties of Texas at the NCI micromap Web site could reveal some very strong patterns, based on the data selection. Figure 1 shows a static LM plot with three statistical columns based on data derived from the NCI Web site. The rows in the figure are sorted according to the 1950–1969 white male lung cancer rates in the United States (US) that reveal some strong geographic pattern with high rates in the eastern, southern, and western United States. A next step could be to resort the rows with respect to highest 1970–1994 rates, then with respect to highest absolute increases from the 1950–1969 to the 1970–1994 rates, and finally with respect to highest relative increases from the 1950–1969 to the 1970–1994 rates. Moreover, the second and third data column might be used to display data from possible confounding variables at the state level, such as smoking rates, gender composition, or educational level. After the exploration of several such LM plots, a researcher likely will have observed many known facts about the spatial distribution of male lung cancer, but, hopefully, some unexpected patterns and relationships also will have emerged.

1852

3.3

J. Symanzik

Conditioned Choropleth Maps

Conditioned choropleth maps (CCmaps) (Carr et al. 2000; Carr and Pickle 2010) were introduced as a tool for the exploration of spatial data that consist of geographic locations, one dependent variable, and two independent variables. Via sliders, a researcher can interactively partition each of the two independent variables and the dependent variable into three different intervals each. A 3  3 set of panels containing nine partial maps shows the color-coded level (high, medium, low) of the dependent variable for those geographic locations that relate to high values of variable one and high values of variable two in map one, for those geographic locations that relate to high values of variable one and medium values of variable two in map two, and so on. For example, in an agricultural setting, variable one might be the amount of fertilizer, variable two might be the amount of precipitation, and the dependent variable might be the yield of a crop. One might expect that high values for fertilizer and precipitation result in a large yield. The nine maps show the relationship among the three variables in a geographic framework, thus allowing the consideration of the underlying spatial structure of the data and not only the statistical relationships. Cutoff values can be changed interactively, thus allowing the investigation of many possible settings. CCmaps are useful tools for the interactive generation of statistical hypotheses for medical, epidemiological, and environmental applications. In Fig. 2, the 1997 soybean production in the United States is conditioned on acreage and yield. In an interactive environment, slider settings can be further modified to identify geographic areas of interest on the nine maps.

4

ESDA via Linking and Brushing

While in the previous section map views were interactively manipulated in a rather direct way, we will discuss in this section how map views and associated statistical displays can be interactively manipulated via linked views and brushing. This is commonly understood as the classical idea of EDA and ESDA. For a detailed discussion of concepts for interactive graphics, such as brushing, linked brushing, linked views, focusing, zooming, panning, slicing, rescaling, reformatting, rotations, projections, and the grand tour, the reader is referred to Symanzik (2012, Sect. 12.3). Main statistical plot types that can be frequently found as components of linked views include histograms, scatterplots, scatterplot matrices, the grand tour, parallel coordinate plots, bar charts, pie charts, spine plots, mosaic plots, ray–glyph plots, and cumulative curves (such as the Lorenz curve). Most of these plot types also were discussed in Symanzik (2012, Sect. 12.3). Figure 3 shows one map view that has been linked with two scatterplots. In addition, plots for spatial data, such as variogram–cloud plots (see Fig. 4) and spatially lagged scatterplots, can be components of the linked views. The overall idea of brushing is to mark different subsets of the data in a particular plot with different colors, symbols, sizes, or point or line styles. This is usually done

Exploratory Spatial Data Analysis

1853

File Variable Edit View Zoom Snapshots Cognostic (Help) Bushels in Millions Soybean Production 35% 32% 32% Year: 1997 Regions: States

11.9 0.482

Weights: Equal Weights Contact: [email protected]

77.8 234 446

326

102

5.70

44.8

58.2

8.71

165

35.7

35%

29.2

Yield in Bushels per Acre

35%

58.3

8.17

29%

20.1

42% 0.870 0.0131

42%

16%

83.9% R−Squared

4.13 6.17 9.94 Acres In Millions

Fig. 2 CCmaps, based on data from the USDA–NASS Web page, related to soybean production in the United States. The plot shows the dependent variable production (top slider) that is conditioned on the two independent variables acreage (bottom slider) and yield (right slider). (Previously published as Fig. 1.7 in Symanzik and Carr (2008, p. 289). Reprinted by permission from Springer Nature, Copyright 2008)

based on the visual appearance of patterns in a specific plot, for example, outliers that seem to be far away from the remaining points in a histogram or scatterplot, or clusters that seem to be well separated from each other. In the next plot that is being produced, the original assessment will be reevaluated, additional points may be marked, or points may be marked differently. In the framework of linked brushing and linked views, the brushing information is carried over from one plot to the next. For example, outliers that are marked in a histogram or scatterplot will be marked in a similar way (with the same colors, symbols, sizes, or point or line styles) in all related plots, in particular on a map view as well. Monmonier (1989) introduced the term geographic brushing in reference to interacting with the map view of geographically referenced data. In Fig. 3, the US cities with the highest index for education have been brushed in the left scatterplot, and cities with a high crime index have been identified by name in the same scatterplot. The map view shows the locations of these cities with the same color and symbols as in the scatterplot. Moreover, the scatterplot on the right reveals that a high crime index is associated with a high recreation index while a high education index is associated with a medium recreation index. Nothing striking is

1854

J. Symanzik

Fig. 3 Screenshot of the “Places” data in ArcView/XGobi. A map view of 329 cities in the United States is displayed in ArcView at the top. The two XGobi windows at the bottom are showing scatterplots of crime (horizontal) versus education (vertical) (left) and recreation (horizontal) versus arts (vertical) (right). Locations of high crime have been brushed and identified, representing some of the big cities in the United States. Also, locations of high education (above 3,500) have been brushed, mostly representing locations in the northeastern United States. All displays have been linked. (Previously published as Fig. 12.1 in Symanzik (2012, p. 339). Reprinted by permission from Springer Nature, Copyright 2012)

noticeable when comparing the brushed values for education and crime with the arts index. Extensions of brushing for spatial data, such as moving statistics, or brushing, applied to origin–destination flow data (Liu and Marble 1997), exist. In advanced software environments, brushing can take place in any of the linked views, including the map view. So, when locations in a specific geographic region are marked, the other statistical views will reveal whether there is some possible statistical relationship among the data from the selected locations as well, for example, whether the statistical values are similar to each other or whether the statistical values span the entire range of the underlying data distribution. Linked brushing is not always one-to-one between the different displays. In a variogram–cloud plot, the absolute difference (or a related measure) of a variable of interest is calculated for all pairs of spatial locations, and this measure is plotted against the Euclidean distance between the two associated points (up to a cutoff

Exploratory Spatial Data Analysis

1855

Fig. 4 Example of a variogram–cloud plot that is linked to a map view, based on precipitation data for the northeastern United States. In the upper left XGobi window, we have brushed (using a solid rectangle) the highest values in the variogram-cloud plot. In the lower right ArcView map view, each pair of locations, related to a point that has been brushed in XGobi, has been connected by a line. (Previously published as Fig. 2 in Symanzik et al. (2000, p. 477), Journal of Computational and Graphical Statistics. Copyright © American Statistical Association, https://www.amstat.org, reprinted by permission of Taylor & Francis Ltd., https://www.tandfonline.com, on behalf of the American Statistical Association)

distance chosen by the researcher). Thus, when brushing one point in a variogram–cloud plot, this needs to be translated to a pair of spatial locations that are brushed in the map view. Figure 4 shows such a link for precipitation measurements in the northeastern United States. In this figure, the highest values in the variogram–cloud plot (up to the cutoff distance) have been brushed. The map view shows two points, that is, spatial locations, that are connected to several other spatial locations. The location in the northeast likely is a spatial outlier as its precipitation measurements are considerably different (either higher or lower) than those from all nearby locations. A next step would be to explore additional variables for these locations, starting with elevation.

1856

J. Symanzik

The location in the southwest likely is not a spatial outlier; rather, there is some considerable local variation happening in this region as this location is only connected to some, but by far not all, locations in its neighborhood.

5

Local Indicators of Spatial Association (LISA)

Moran’s I statistic is a well-known measure for spatial autocorrelation at the global level for lattice data. Anselin (1995) introduced a local Moran statistic, a local Gamma statistic, a local Geary statistic, a Moran scatterplot, and other LISA statistics to assess the spatial association at a location i. The LISA statistics allow to identify local spatial clusters and to assess local instability. Moreover, the LISA statistics allow to assess the influence of a single location on the corresponding global statistic, a feature that is as important as being able to identify influential points in a regression framework. LISA statistics are probably the most frequently applied ESDA technique, with applications in areas as diverse as regional sciences, spatial econometrics, epidemiology, social sciences, and criminology. Despite its wide use, one should keep in mind that LISA statistics are exploratory in nature, and, usually, additional steps are required to confirm the initial results derived from LISA statistics. Ord and Getis (2012) introduced a local spatial heteroscedasticity (LOSH) statistic, an extension of local spatial autocorrelation statistics, to measure spatial variability.

6

Software for ESDA

ESDA is highly dependent on software that supports various types of statistical displays, map views, and that allows linked brushing. Two main approaches have been developed during the last 30 years: Conducting ESDA in software environments where a geographic information system (GIS) is linked to statistical software packages and in stand-alone statistical software solutions. A more detailed overview of various software solutions for ESDA has been provided in Symanzik (2012, Sect. 12.6.1).

6.1

ESDA and GIS

Fotheringham (1992) pointed out that it is not necessary to conduct an exploratory spatial data analysis within a GIS, but that in many circumstances, using a GIS to do so might simplify the exploration of the data and provide insights that otherwise might be missed. Therefore, over the next decade, several researchers developed software that linked GIS with statistical software, or they added statistical features to existing GIS.

Exploratory Spatial Data Analysis

1857

In Anselin (1994), a series of ESDA techniques were discussed in the context of a GIS, with the primary focus on exploring the spatial nature of the underlying data. These techniques could be classified as techniques based on the neighborhood view of spatial association (such as Moran scatterplots and LISA statistics) and as techniques based on the distance view of spatial association (such as spatially lagged scatterplots and variogram–cloud plots). Various software links between GIS such as Arc/Info, ArcView, and Grassland and one or more statistical software packages implemented several of these techniques. Some of the links that were developed and maintained over a longer time period were links between Arc/Info, respectively, ArcView, and SpaceStat (Anselin et al. 1993) and links between ArcView, XGobi, and XploRe (Symanzik et al. 2000). One major limitation of such software links is that whenever one of the individual software packages is modified with respect to the functionality of the link, the other software packages have to be modified accordingly.

6.2

Stand-Alone Software for ESDA

In contrast to linking GIS and statistical software, several software developers focused on the development of stand-alone statistical software that also support map views of the spatial locations that are linked with statistical displays. Some of the best-known examples are Spider (Haslett et al. 1990), REGARD (Unwin et al. 1990), and, more recently, GeoDa (Anselin et al. 2006). One major limitation of stand-alone software for ESDA is that the functionality that is usually available in a GIS has to be reimplemented in a statistical software package. Java-based software for the creation of linked micromap plots, developed by the NCI, can be freely downloaded at https://gis.cancer.gov/tools/micromaps/. In recent years, the statistical software environment R (R Core Team 2018) has become the lingua franca of statistics. Since its appearance around 1996 (Ihaka and Gentleman 1996), R has been further advanced by thousands of creators of contributed packages (more than 13,500 in February 2019) that provide all kinds of additional functionality beyond the original R base functionality. This includes packages for maps, color selections, EDA and ESDA, and advanced statistical functionality for spatial data, such as the following: maptools (http://cran.r-project.org/web/packages/maptools/index.html) that allows to read and manipulate geographic data, in particular ESRI shapefiles, maps (http://cran.r-project.org/web/packages/maps/index.html) that provides access to a variety of maps, ggmap (http://cran.r-project.org/web/packages/ggmap/index.html) that builds on the popular R visualization package ggplot2 and allows the display of spatial data and models on static Google and Stamen maps, tmap (http://cran.r-project.org/web/packages/tmap/index.html) that offers a flexible and layer-based approach to create thematic maps, including choropleth and bubble maps,

1858

J. Symanzik

RgoogleMaps (http://cran.r-project.org/web/packages/RgoogleMaps/index.html) that allows to query the Google server for static maps and to use one of the Google maps as a background image to overlay statistical plots from within R. micromapST (http://cran.r-project.org/web/packages/micromapST/index.html) for a quick creation of linked micromap plots, with a focus on geographic regions of the United States, micromap (http://cran.r-project.org/web/packages/micromap/index.html) for the creation of linked micromap plots, supporting the use of shapefiles for regions from around the world, RColorBrewer (http://cran.r-project.org/web/packages/RColorBrewer/index.html), the R implementation of http://colorbrewer2.org, for good color choices for maps and other plots. iplots (http://cran.r-project.org/web/packages/iplots/index.html), an R package in the spirit of Spider and REGARD, for interactive plots in R, including maps, sf (http://cran.r-project.org/web/packages/sf/index.html), simple features that provide a standardized way to encode spatial vector data in R, splancs (http://cran.r-project.org/web/packages/splancs/index.html) for the exploration and analysis of spatial and space–time point patterns, spatstat (http://cran.r-project.org/web/packages/spatstat/index.html) for the exploration and analysis of spatial data, mainly spatial point patterns, spdep (http://cran.r-project.org/web/packages/spdep/index.html) for the analysis of spatial dependence at a local and global scale, including Moran, LISA, and LOSH statistics, geoR (http://cran.r-project.org/web/packages/geoR/index.html) for the exploration and analysis of geostatistical data, gstat (http://cran.r-project.org/web/packages/gstat/index.html) for modeling, prediction, and simulation of spatial and spatiotemporal geostatistical data, spgrw (http://cran.r-project.org/web/packages/spgwr/index.html) for computing geographically weighted regression. While the Web pages listed above provide detailed user guides and information how to use each of these packages, Bivand (2010) demonstrated how many of the ESDA techniques described in this chapter can be performed in R. An extended overview of additional R packages for the reading, exploration, visualization, and analysis of spatial data can be found at http://cran.r-project.org/web/views/Spatial. html.

7

Conclusions

In this chapter, we have provided an overview of techniques, methods, and software solutions for ESDA. Most of the developments took place during the last 30–35 years. Due to the rapid development of computer hardware, including highquality graphic displays, over the last few decades, ESDA techniques are nowadays easily accessible for many researchers on a wide variety of hardware platforms.

Exploratory Spatial Data Analysis

1859

A current hotspot for ongoing development of ESDA techniques is R and its thousands of contributed packages. For a few decades, software packages for exploratory data analysis were relatively weak for confirmatory data analysis (using John Tukey’s terms here) and vice versa. However, R is continuously getting stronger for both types of data analyses, and it is able to handle a large variety of GIS data formats. It can be expected that in the near future, exploratory and confirmatory data analyses will be conducted almost simultaneously in R or some similar software environment. Once a researcher detects something of interest in a spatial data set via ESDA, a confirmatory analysis can immediately follow, and once a confirmatory analysis has been conducted, ESDA can be used to further explore the spatial fit of the fitted model, its residuals, and so on. A trend in recent years has been to provide access to spatial data for everyone via Web interfaces. This includes the previously introduced Web sites for interactive micromaps, but, even more, Web-based software such as gapminder (Rosling and Johansson 2009), accessible at http://www.gapminder.org/. The Google version, called Google Public Data Explorer, accessible at http://www.google.com/ publicdata/directory, might become a tool that provides easy and fast access to EDA and ESDA techniques for millions of Web users. Other promising recent directions are Web-based maps and visualizations via the shiny (http://shiny. rstudio.com/), leaflet (http://rstudio.github.io/leaflet/), and plotly (http://plot.ly/r/ maps/) R packages and their combinations, such as http://rstudio.github.io/leaflet/ shiny.html and http://plot.ly/r/shiny-gallery/.

8

Cross-References

▶ Scale, Aggregation, and the Modifiable Areal Unit Problem ▶ Spatial Clustering and Autocorrelation of Health Events ▶ Web-Based Tools for Exploratory Spatial Data Analysis

References Andrienko N, Andrienko G, Savinov A, Voss H, Wettschereck D (2001) Exploratory analysis of spatial data using interactive maps and data mining. Cartogr Geogr Inform Sci 28(3):151–165 Anscombe FJ (1973) Graphs in statistical analysis. Am Statistician 27(1):17–21 Anselin L (1994) Exploratory spatial data analysis and geographic information systems. In: Painho M (ed) New tools for spatial analysis. Eurostat, Luxembourg, pp 45–54 Anselin L (1995) Local indicators of spatial association – LISA. Geogr Anal 27(2):93–115 Anselin L, Dodson RF, Hudak S (1993) Linking GIS and spatial data analysis in practice. Geogr Sys 1(1):3–23 Anselin L, Syabri I, Kho Y (2006) GeoDa: an introduction to spatial data analysis. Geogr Anal 38(1):5–22 Bivand RS (2010) Exploratory spatial data analysis. In: Fischer MM, Getis A (eds) Handbook of applied spatial analysis: software tools, methods and applications. Springer, Berlin/Heidelberg, pp 219–254

1860

J. Symanzik

Brunsdon C, Comber L (2019) An introduction to R for spatial analysis & mapping, 2nd edn. Sage, London Carr DB, Pickle LW (2010) Visualizing data patterns with micromaps. Chapman & Hall/CRC, Boca Raton Carr DB, Wallin JF, Carr DA (2000) Two new templates for epidemiology applications: linked micromap plots and conditioned choropleth maps. Stat Med 19(17–18):2521–2538 Cressie NAC (1993) Statistics for spatial data, revised edn. Wiley, New York Fischer MM, Wang J (2011) Spatial data analysis: models, methods and techniques. Springer, Berlin/Heidelberg/New York Fotheringham AS (1992) Exploratory spatial data analysis and GIS. Environ Plann A 24(2):1675–1678 Harrower MA, Brewer CA (2003) ColorBrewer.org: an online tool for selecting color schemes for maps. Cartogr J 40(1):27–37 Haslett J, Wills G, Unwin A (1990) SPIDER – an interactive statistical tool for the analysis of spatially distributed data. Int J Geogr Inform Syst 4(3):285–296 Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5(3):299–314 Liu L, Marble D (1997) Brushing spatial flow data sets. In: 1997 proceedings of the Section on Statistical Graphics. American Statistical Association, Alexandria, pp 67–72 Monmonier M (1989) Geographic brushing: enhancing exploratory analysis of the scatterplot matrix. Geogr Anal 21(1):81–84 Monmonier M (1996) How to lie with maps, 2nd edn. University of Chicago Press, Chicago Openshaw S (1984) The modifiable areal unit problem. In: Concepts and techniques in modern geography no. 38. Geo Books, Norwich Ord JK, Getis A (2012) Local spatial heteroscedasticity (LOSH). Ann Reg Sci 48(2):529–539 Osiecki KM, Kim S, Chukwudozie IB, Calhoun EA (2013) Utilizing exploratory spatial data analysis to examine health and environmental disparities in disadvantaged neighborhoods. Environ Justice 6(3):81–87 R Core Team (2018) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org/ Ripley BD (1981) Spatial statistics. Wiley, New York Rosling H, Johansson C (2009) Gapminder: liberating the x–axis from the burden of time. Stat Comput Stat Graph Newslett 20(1):4–7 Snow J (1936) Snow on cholera: being a reprint of two papers by John Snow, M.D. together with a biographical memoir by B. W. Richardson, M.D. and an introduction by Wade Hampton Frost, M.D. The Commonwealth Fund/Oxford University Press, New York/London Symanzik J (2012) Interactive and dynamic graphics. In: Gentle JE, Härdle WK, Mori Y (eds) Handbook of computational statistics – concepts and methods, 2nd edn. Springer, Berlin/ Heidelberg, pp 335–373 Symanzik J, Carr DB (2008) Interactive linked micromap plots for the display of geographically referenced statistical data. In: Chen C, Härdle W, Unwin A (eds) Handbook of data visualization. Springer, Berlin/Heidelberg, pp 267–294. & 2 color plates Symanzik J, Cook D, Lewin-Koh N, Majure JJ, Megretskaia I (2000) Linking ArcView and XGobi: insight behind the front end. J Comput Graph Stat 9(3):470–490 Tukey JW (1977) Exploratory data analysis. Addison Wesley, Reading Unwin A, Wills G, Haslett J (1990) REGARD – graphical analysis of regional data. In: 1990 proceedings of the Section on Statistical Graphics. American Statistical Association, Alexandria, pp 36–41 Wang X, Chen JX, Carr DB, Bell BS, Pickle LW (2002) Geographic statistics visualization: web–based linked micromap plots. Comput Sci Eng 4(3):90–94 Wilson B, Greenlee AJ (2016) The geography of opportunity: an exploratory spatial data analysis of U.S. counties. GeoJournal 81(4):625–640

Exploratory Spatial Data Analysis

1861

Further Reading Bao S, Anselin L (1997) Linking spatial statistics with GIS: operational issues in the SpaceStat–ArcView link and the S + Grassland link. In: 1997 proceedings of the Section on Statistical Graphics. American Statistical Association, Alexandria, pp 61–66 Carr DB, Chen J, Bell BS, Pickle LW, Zhang Y (2002) Interactive linked micromap plots and dynamically conditioned choropleth maps. In: Proceedings of the second national conference on digital government research, Digital Government Research Center (DGRC), pp 61–67. http:// www.dgrc.org/conferences/2002_proceedings.jsp Cook D, Majure JJ, Symanzik J, Cressie N (1996) Dynamic graphics in a GIS: exploring and analyzing multivariate spatial data using linked software. Comput Stat 11(4):467–480. Special issue on computeraided analysis of spatial data Kahle D, Wickham H (2013) ggmap: spatial visualization with ggplot2. R Journal 5(1):144–161 Payton QC, McManus MG, Weber MH, Olsen AR, Kincaid TM (2015) micromap: a package for linked micromaps. J Stat Softw 63(2). http://www.jstatsoft.org/v63/i02/ Pickle LW, Pearson JB Jr, Carr DB (2015) micromapST: exploring and communicating geospatial patterns in U.S. state data. J Stat Softw 63(3). http://www.jstatsoft.org/v63/i03/ Tennekes M (2018) tmap: thematic maps in R. J Stat Softw 84(6). http://www.jstatsoft.org/v84/i06/ Unwin A (1994) REGARDing geographic data. In: Dirschedl P, Ostermann R (eds) Computational statistics. Physica–Verlag, Heidelberg, pp 315–326

Spatial Autocorrelation and Moran Eigenvector Spatial Filtering Daniel Griffith and Yongwan Chun

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Spatial Autocorrelation: A Conceptual Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Spatial Filtering: A Conceptual Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Moran Eigenvector Spatial Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Eigenvectors of a Spatial Weight Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 An Eigenvector Spatial Filter Specification of the Linear Regression Model . . . . . . 2.3 The Numerical Calculation of Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Statistical Features of Moran Eigenvector Spatial Filtering . . . . . . . . . . . . . . . . . . . . . . . . . 3 Comparisons with the Spatial Lag Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 An Empirical Example: An Moran Eigenvector Spatial Filtering Methodology Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Extensions to Spatial Interaction Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Extensions to Space-Time Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1864 1864 1867 1868 1868 1871 1872 1877 1878 1883 1885 1886 1890 1890 1891 1892

Abstract

This chapter provides an introductory discussion of spatial autocorrelation (SA), that refers to correlation existing and observed in geospatial data, and which characterizes data values that are not independent, but rather are tied together in overlapping subsets within a given geographic landscape. This chapter summarizes the various interpretations of SA with particular emphasis on its explanation as map pattern. SA can be quantified in a number of different ways, too, one being with the Moran Coefficient. Spatial filtering is a statistical method whose goal is to obtain enhanced and robust results in a spatial data analysis by decomposing a D. Griffith · Y. Chun (*) Geospatial Information Sciences, School of Economic, Political and Policy Sciences, University of Texas at Dallas, Richardson, TX, USA e-mail: dagriffi[email protected]; [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_72

1863

1864

D. Griffith and Y. Chun

spatial variable into trend, a spatially structured random component (i.e., a spatial stochastic signal), and random noise. Its aim is to separate spatially structured random components from both trend and random noise, and, consequently, leads statistical modeling to sounder statistical inference and useful visualization. This separation procedure can involve eigenfunctions of the matrix version of the numerator of the Moran Coefficient. This chapter summarizes the Moran eigenvector spatial filtering (MESF) conceptual material, and presents the computer code for implementing MESF in R, Matlab, MINITAB, FORTRAN, and SAS. Next, it demonstrates the statistical features of MESF. Finally, it summarizes a MESF empirical example application, and the extensions of MESF to spatial interaction modeling and space-time data analysis. Keywords

Spatial autocorrelation · Moran eigenvector spatial filtering · Spatial weight matrix · Regression analysis · Spatial interaction · Space-time data

1

Introduction

Spatial autocorrelation (SA) refers to correlation existing and observed in geospatial data – data collected together with their locational position tags on a two-dimensional (2-D) surface. This concept describes data values that are not independent but rather are tied together in overlapping subsets within a given geographic landscape. Relative location, commonly expressed in terms of geographic closeness, partly determines SA. Tobler’s (1969, p. 7) first law of geography provides a generalized view of how phenomena occur across space within this context: “Everything is related to everything else, but near things are more related than distant things.” This description indicates that a data value at a given location tends to be (dis) similar to those values for the same variable at nearby locations. For example, house price and house value comparisons among effectively equivalent neighboring houses partly govern a real estate market; that is, real-world phenomena are not likely to occur in a random fashion but rather in a spatially structured manner, over space. Griffith (1992) points out that, in practice, spatial scientists essentially interpret SA in one of nine different ways: self-correlation, map pattern, a diagnostic tool, a missing variables surrogate, a spatial process mechanism, a spatial spillover effect, an outcome of areal unit demarcation (re the MAUP), redundant information, and a nuisance parameter. Discussion in this chapter exploits the map pattern interpretation of SA.

1.1

Spatial Autocorrelation: A Conceptual Overview

Spatial autocorrelation (SA, Getis 2010) can be interpreted in different ways. First, it can be interpreted literally as self-correlation which arises in 2-D space. Unlike the

Spatial Autocorrelation and Moran Eigenvector Spatial Filtering

1865

conventional Pearson’s product moment correlation coefficient that measures co-variability of paired values in two variables, SA measures correlation among paired values in a single variable based on relative spatial locations. Because it focuses on a tendency among values of a variable based on their spatial closeness, SA is measured within the combinatorial context of all possible pairs of observed values for a given variable where corresponding weights that are determined by spatial closeness identify the pairings of interest. Second, SA can be interpreted as map pattern. In regional science, an analysis often involves datasets of individual observations post-stratified by geographical units such as census blocks/block groups/tracts, county boundaries, and country borders. The choropleth mapping of a variable using such areal units portrays a pattern over space. A tendency for similarity or dissimilarity for neighboring values on such a map can be directly interpreted as SA. While large clusters of similar values on a map indicate positive SA, an alternating map pattern, in which the tendency is for values to be dissimilar to those of their neighbors, constitutes negative SA. Third, SA can be interpreted as a diagnostic tool for misspecification in statistical modeling such as linear regression analysis. For example, detected SA among residuals in such a regression analysis can suggest that the linear relationship specified between a response variable and a covariate needs to be replaced with a nonlinear functional description of the relationship. Cliff and Ord (1981, p. 209) furnish a useful illustration of this situation by employing an empirical example in which the 1961 population as a percentage of the 1926 population (Y) by Eire county (n = 26) is cast as a function of arterial road network accessibility (X). Table 1 summarizes selected results from their analysis, supplemented with an extension. As the nonlinear relationship between Y and X is better articulated, SA displayed by the corresponding regression residuals decreases to a clearly statistically nonsignificant level. Finally, SA also can imply that variables are missing from a regression model specification, in which case it serves as a surrogate for unexplained variation in the response variable in question. When a variable is missing from a model specification that, indeed, accounts for a substantial amount of variation in a dependent variable, then the unexplained variation may occur in the form of SA among residuals. Griffith and Chun (2016) provide an extensive discussion of such omitted variable cases. If a missing variable forms an underlying map pattern – that is, significant SA exists – then residuals are very likely to exhibit a conspicuous level of SA (e.g., Paéz 2018).

Table 1 The 1981 Cliff-Ord example, with an extension Model specification Y = α + βX + ε Y = αXβeε Y = α + βLN(X  δ) + ε

(Pseudo-)R2 0.404 0.517 0.735

Residual spatial autocorrelation z-score 1.87 1.41 0.76

1866

D. Griffith and Y. Chun

Research in the spatial sciences in general, and in geography and regional science in particular, has addressed the numerical quantification of SA. The Moran Coefficient (MC) is the most commonly used quantitative measure of SA; it is analogous to the Pearson’s correlation coefficient. The MC may be written as n P n P

  P n P n cij ðyi  yÞ yj  y = cij

n sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffisffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ n n P P n n P P cij ðyi  yÞ2 =n ðyi  yÞ2 =n i¼1 j¼1

i¼1 j¼1

i¼1

i¼1 j¼1

i¼1

n P

" ðyi  yÞ

i¼1

n P

  cij yj  y

#

j¼1 n P

ðyi  yÞ2

i¼1

(1) where cij is a non-negative nonzero spatial weight tying together the pair of areal units i and j, n is the number of areal units, and y is the arithmetic mean of response variable Y. The value of cij reflects spatial closeness between areal P units i and j. The numerator of the MC contains the pairs of values ðyi  yÞ and nj¼1 cij ðyj  yÞ, whose graphic portrayal is the Moran scatterplot. In most cases, the MC utilizes spatial connectivity information for a set of areal units. That is, in binary form, cij = 1 when areal units i and j have a common boundary, and zero otherwise; the rook’s case definition of spatial closeness is when all common boundaries have nonzero length (i.e., they are lines), whereas the queen’s case definition is when some also have zero length boundaries (i.e., they are points). The range of the MC differs from that of Pearson’s correlation coefficient, which covers the interval [1, 1]. Mathematical quantities of the n-by-n matrix of cij values, C, called eigenvalues determine the exact range of a MC (de Jong et al. 1984). Often, the largest value is slightly larger than one, and the smallest value is between 1 and 0.5. Statistical hypothesis testing involving the MC can be based upon a normal distribution assumption; Cliff and Ord (1981) derive its sampling distribution theory as such. The expected value for the MC is 1/(n  1) under the null hypothesis of zero SA. This expected value is the threshold value separating positive and negative SA. That is, while an MC value that is greater than this expected value indicates positive SA, one that is less than this expected value indicates negative SA. Although equations for the standard deviation of the MC are rather complicated (see Cliff and Ord 1981 for details), it can be well P P 1= approximated by ð2ð ni¼1 nj¼1 cij Þ1 Þ 2 when n  20 (Griffith 2010); the summaPn P n tion term, i¼1 j¼1 cij, counts the number of ones in matrix C, which equals twice the number of neighbors in a surface partition (i.e., a link from areal unit i to areal unit j plus a link from areal unit j to areal unit i). A z-score calculated for the MC with the foregoing expected value and standard deviation furnishes a basis for statistical inference that asymptotically is the standard normal distribution.

Spatial Autocorrelation and Moran Eigenvector Spatial Filtering

1.2

1867

Spatial Filtering: A Conceptual Overview

Spatial filtering (Griffith et al. 2019) is a statistical method whose goal is to obtain enhanced and robust results in a spatial data analysis. The fundamental idea is based upon a decomposition of a spatial variable into the following three components: trend, a spatially structured random component (i.e., spatial stochastic signal), and random noise. Its aim is to separate spatially structured random components from both trend and random noise, and, consequently, leads statistical modeling to sounder statistical inference and useful visualization. This separation procedure involves sophisticated mathematical operators and produces the aforementioned eigenvalues as well as their accompanying eigenvectors (the couplings are called eigenfunctions). A limited number of spatial filtering methods currently exist, including autoregressive linear operators, Getis’s Gi-based specification, interpoint distance matrix eigenfunctions, and spatial weight matrix eigenfunctions. These methods utilize different constructed mathematical operators to decompose a spatial variable into the three components. First, following the Cochrane-Orcutt procedure for time series analysis, autoregressive linear operators can be used to prewhiten a dependent variable to render independent and identically distributed (iid) random errors (Haining 1991). This spatial filtering method takes the matrix form (I  ρ C), where ρ is a SA parameter, and I is a n-by-n identity matrix. With the parameter ρ estimated solely from a variable Y, a prewhiten dependent variable is generated by (I  ρ C)Y. Second, Getis’s Gi-based method converts each spatially correlated variable into two variates, one capturing SA and the other containing nonspatial systematic and random effects (Getis and Griffith 2002). Regression with spatial and nonspatial variates enables a separation of SA components from trend and white noise components. Third, Borcard and Legendre (2002) propose a procedure called principal coordinates of neighbor matrices (PCNM) that exploits eigenfunctions of the n-by-n matrix of truncated geographic distances among locations [truncation can be at the (effective) range of SA as identified by, say, a semivariogram]. They use eigenvectors corresponding to positive eigenvalues as spatial descriptors in regression or canonical analysis. They argue that this method can be applied to any set of locations providing a good coverage of a given geographic landscape. Fourth, Moran eigenvector spatial filtering (MESF) methodology utilizes eigenvectors extracted from spatial weight matrix C (Griffith 2003). Details of this latter methodology are discussed in the next sections because this chapter focuses on the MESF method. Spatial filtering methods are increasingly utilized in regression settings, although they can be applied to individual variables, too. Especially, the ESF methodology has been utilized in different model specifications, including linear regression (Getis and Griffith 2002), generalized linear models (Lee et al. 2018), and space-time panel models (Patuelli et al. 2011; Hu et al. 2018). One recent interesting utilization of spatial filtering methodology explores negative SA (NSA). Because NSA rarely is observed in empirical georeferenced data,

1868

D. Griffith and Y. Chun

studies dominantly focus on positive SA (PSA). The literature about NSA still is relatively limited. In addition, a NSA component often is mixed with (or hidden by) more prominent PSA patterns (Griffith 2006). Hence, NSA is not easily observed with a single global SA measure, and, furthermore, does not tend to be easily recognized by popular spatial regression model specifications that posit a single SA parameter. However, spatial filtering methodology, especially MESF, furnishes a useful way to detect and account for hidden NSA components in modeling georeferenced data.

2

Moran Eigenvector Spatial Filtering

MESF uses a set of synthetic proxy variables, which are extracted as eigenvectors from a spatial weight matrix C that ties geographic objects together in space and then adds these vectors as control variables to a model specification. These control variables identify and isolate the stochastic spatial dependencies among the georeferenced observations, thus allowing model building to proceed as if the observations are independent.

2.1

Eigenvectors of a Spatial Weight Matrix

The MESF method utilizes a mathematical decomposition, whose results are called eigenfunctions, of the following transformed spatial weight matrix: 

   I  11T =n C I  11T =n

(2)

where 1 is an n-by-1 vector of ones, and superscript T denotes the matrix transpose operator. The decomposition generates n eigenvectors and their associated n eigenvalues. In descending order, the n eigenvalues can be denoted as λ = (λ1, λ2, λ3, . . ., λn). That is, the eigenvalues range between the largest eigenvalue that is positive, λ1, and the smallest eigenvalue that is negative, λn. The corresponding n eigenvectors can be denoted as E = (E1, E2, E3, . . ., En), where each eigenvector, Ej, is an n-by-1 vector. In matrix notation, the decomposition can be expressed as MCM ¼ EΛET

(3)

where M = (I  11T/n), the projection matrix that centers a variable and that commonly appears in standard multivariate statistical theory, and Λ is an n-by-n diagonal matrix whose diagonal elements are the set of n eigenvalues (λ1, λ2, λ3, . . ., λn). The eigenfunctions have some important properties. First, the eigenvectors are mutually orthogonal and uncorrelated (Griffith 2003): the symmetry of matrix C ensures orthogonality, and the projection matrix M ensures eigenvectors with

Spatial Autocorrelation and Moran Eigenvector Spatial Filtering

1869

zero means, guaranteeing uncorrelatedness. That is, EET = I and ET1 = 0, and the correlation between any pair of eigenvectors, say Ei and Ej, is zero when i 6¼ j. Second, the eigenvectors portray distinct, selected map patterns. Tiefelsdorf and Boots (1995) show that each eigenvector portrays a different map pattern exhibiting a specified level of SA when it is mapped onto the n areal units associated with the corresponding spatial weight matrix C. They also show that the MC value for a mapped eigenvector is equal to a function of its corresponding eigenvalue (i.e., MCj = (n/1TC1) ∙ λj, for Ej). Third, given a spatial weight matrix, the feasible range of MC values is determined by the largest and smallest eigenvalues; that is, by λ1 and λn (de Jong et al. 1984). Based upon these properties, the eigenvectors can be interpreted as follows (Griffith 2003): The first eigenvector, E1, is the set of real numbers that has the largest MC value achievable by any set of real numbers for the spatial arrangement defined by the spatial weight matrix C; the second eigenvector, E2, is the set of real numbers that has the largest achievable MC value by any set that is uncorrelated with E1; the third eigenvector, E3, is the set of real numbers that has the largest achievable MC value by any set that is uncorrelated with both E1 and E2; the fourth eigenvector is the fourth such set of values; and so on through En, the set of real numbers that has the largest negative MC value achievable by any set that is uncorrelated with the preceding (n1) eigenvectors.

As such, these eigenvectors furnish distinct map pattern descriptions of latent SA in spatial variables, because they are mutually both orthogonal and uncorrelated. Figure 1 presents selected eigenvectors generated with the rook’s definition of adjacency used to construct a 0–1 binary spatial weight matrix for the 73 municipalities of Puerto Rico. Figure 1a portrays the boundaries of the 73 municipalities together with their neighboring structure: the red lines indicate that cij for a pair of connected municipalities equals 1; otherwise, it equals 0. Figure 1b is a map of the first eigenvector, E1, which portrays the maximum positive SA possible for the established matrix C. Its MC value is 1.0938. This map pattern shows two large clusters of areal units, one of positive numbers and one of negative numbers. Figure 1c portrays the map pattern of E18, whose MC value is 0.2832. It still indicates positive SA but with a degree much weaker than that for eigenvector E1. It depicts smaller-sized areal unit clusters of similar values. In contrast, Fig. 1d is the map of eigenvector E73, which portrays the extreme negative SA possible with this spatial tessellation. Its pattern is alternating positive and negative values, and its MC is 0.5657. Puerto Rico’s east–west elongation coupled with its irregular municipality surface partitioning somewhat obscures the conspicuousness of map patterns portrayed by the eigenvectors of matrix C. This notion is more clearly depicted by the eigenvectors of a regular square tessellation. Consider a landscape comprising 900 pixels forming a 30-by-30 square tessellation (Fig. 2). Figure 2a portrays the boundaries of the 900 pixels together with their neighboring structure. Figure 2b portrays a global SA pattern: it contains a northwest-southeast swath crossing a northeast-southwest swath of similar values – with a relative MC value of 0.9922. Figure 2c portrays a

1870

D. Griffith and Y. Chun

a

b

−0.203 - −0.109 −0.109 - −0.041 −0.041 - 0.025 0.025 - 0.107 0.107 - 0.275

c

−0.313 - −0.093 −0.093 - −0.023 −0.023 - 0.033 0.033 - 0.109 0.109 - 0.248

d

−0.298 - −0.084 −0.084 - −0.029 −0.029 - −0.018 0.018 - 0.092 0.092 - 0.346

Fig. 1 The maps of selected eigenvectors based upon matrix C constructed for the municipalities of Puerto Rico: (a) the boundaries and links of the 73 municipalities, (b) the first eigenvector, E1, (c) the 18th eigenvector, E18, and (d) the nth eigenvector, E73

Spatial Autocorrelation and Moran Eigenvector Spatial Filtering

1871

Fig. 2 The maps of selected eigenvectors based upon matrix C constructed for the pixels of a 30-by-30 square tessellation: (a) the boundaries and links of the 900 pixels, (b) eigenvector E2,2, (c) eigenvector E6,6, and (d) eigenvector E10,10

regional SA pattern: it contains 36 moderate-sized clusters of similar values – with a relative MC value of 0.8314. Figure 2d portrays a more local SA pattern: it contains 100 small clusters of similar values – with a relative MC value of 0.5358. In all three cases, appropriate spatial aggregation would yield NSA (i.e., negative MC values).

2.2

An Eigenvector Spatial Filter Specification of the Linear Regression Model

MESF methodology utilizes the eigenvectors calculated in the previous section. In detail, it accounts for SA with a linear combination of the eigenvectors, which

1872

D. Griffith and Y. Chun

constitutes the constructed eigenvector spatial filter (ESF). In linear regression, the ESF model specification may be written as Y ¼ XβX þ Ek βk þ «

(4)

where X is an n-by-(p + 1) matrix containing covariates, βX is the corresponding (p + 1)by-1 vector of regression parameters, Ek is an n-by-k matrix containing k eigenvectors, βk is the corresponding vector of regression parameters, and «  N ð0, Iσ2 Þ is a n-by-1 error vector whose elements are i.i.d normal random variates. Because the ESF, Ekβk, accounts for SA, the MESF linear regression specification does not suffer from SA in its residuals. Addition of the eigenvectors in the regression equation does not change the expected conditional mean of Y because the mean of each eigenvector is zero. Initializing MESF methodology requires the identification of a feasible set of eigenvectors. This procedure involves two steps. In the first step, a candidate set of eigenvectors, which is a noticeably smaller subset (i.e., K library(spdep)

For the first step, the spatial neighbor structure needs to be loaded from an existing file (e.g., PuertoRico.gal). A .GAL file can be created with GeoDa software. The function read.gal() reads a .GAL file and stores its neighbor structure in the .nb object format of spdep: > pr.nb pr.listb B n M MBM 0.25. The third line changes the column names of the EV objects: > eig EV 0.25]) > colnames(EV) 0.25; EV = evec(:,sel);

1.4

Computer Code 1 Sample Matlab code for generating eigenvectors from a spatial weight matrix

The EV objects can be further utilized in a regression analysis in R, or can be exported in a different file format, such as CSV, for other purposes.

2.3.2 Matlab Code for Generating MCM Eigenvectors Matlab does not have a built-in function to read a .GAL file. But the spatial neighbor structure in a .GAL file can be imported with Matlab functions to open and read a file. The Computer Code 1 window provides sample Matlab code to generate eigenvectors. The first five lines of code (code block 1.1 ) open the PuertoRico.gal file in order to store all file contents in the lines variable and then close the file. The next four lines of code (code block 1.2 ) retrieve the number of observations, n, which is stored in the first line of a .GAL file, and then create an n-by-n matrix of all zeros for a binary spatial weight matrix, B. The for statement code block (code block 1.3 ) updates the matrix B by assigning 1s for spatial neighbors from the PuertoRico.gal file. The final six lines of code (code block 1.4 ) generate eigenvalues and eigenvectors from matrix MCM and then select only eigenvectors with MCj/MC1 > 0.25. 2.3.3 MINITAB Code for Generating MCM Eigenvectors MINITAB supports matrix operations and works with the 0–1 binary spatial weight matrix. The Computer Code 2 window provides sample MINITAB code for computing eigenvectors. The first block of code reads in the spatial weight matrix. The next block of code constructs matrix M. The third block of code computes MCM. The fourth block of

Spatial Autocorrelation and Moran Eigenvector Spatial Filtering READ 73 73 M1; FILE ‘C:\PR-CONN.TXT’; FORMAT(2(5X,30F2.0,/),5X,13F2.0). LET K1=73 SET C1 K1(1) END COPY C1 M2 TRANS M2 M3 MULT M2 M3 M2 LET K3=-1/K1 MULT K3 M2 M2 DIAG C1 M3 ADD M3 M2 M2

1875

2.1

2.2

MULT M2 M1 M3 MULT M3 M2 M3

2.3

EIGEN M1 C1 EIGEN M3 C2 M2

2.4

SET C4 K1(1) END MULT M1 C4 C4 LET C3=K1*C2/SUM(C4) LET C4=C3/MAX(C3) SET C5 1:K1 END

2.5

COPY C5 C6; USE C4 = 0.25:1. LET K10=K1+100 COPY M2 C101-CK10 PRINT C6 END

2.6

Computer Code 2 Sample MINITAB code for generating eigenvectors from a spatial weight matrix

code calculates the eigenvalues and their associated eigenvectors. The fifth block of code converts the eigenvalues to MCs. The final block of code identifies for which eigenfunctions MCj/MC1 > 0.25.

2.3.4 FORTRAN Code for Generating MCM Eigenvectors The IMSL subroutines package supports reliable matrix operations for FORTRAN. The Computer Code 3 window provides sample FORTAN code for computing eigenvectors. The first part of the code reads in a spatial neighbors’ file, similar to a .GAL file; its structure is id, number of neighbors, and list of neighbor ids. The first line of this file contains n. The next block of code constructs matrix MCM.

1876

D. Griffith and Y. Chun

USE MSIMSL IMPLICIT DOUBLE PRECISION (A-H,O-Z) INTEGER INEIGH(3000,75),NI(3000) DOUBLE PRECISION C(3000,3000),W(3000,3000),ID(3000),EVAL(3000),MC, CMCMAX

5

18 19 20

OPEN(5,FILE='C:\INPUT.NEI') READ(5,*) N SUMC=0.0D0 DO 5 I=1,N READ(5,*) ID(I),NI(I),(INEIGH(I,J),J=1,NI(I)) SUMC=SUMC+REAL(NI(I)) DO 20 I=1,N DO 18 J=1,N C(I,J) = 0.0D0 DO 19 J=1,NI(I) C(I,INEIGH(I,J))=1.0D0 CONTINUE

3.1

DO 40 I=1,N DO 39 J=1,N 39 W(I,J) = C(I,J) - REAL(NI(I))/REAL(N) - REAL(NI(J))/REAL(N) C + SUMC/REAL(N**2) 40 CONTINUE 49 CALL DEVCSF(N,W,3000,EVAL,C,3000) MCMAX=0.0D0 DO 50 I=1,N MC=(REAL(N)/SUMC)*EVAL(I) IF(MC.GE.MCMAX) MCMAX=MC 50 WRITE(7,1000) ID(I),EVAL(I),MC K=0 DO 60 J=1,N IF((REAL(N)/SUMC)*EVAL(J)/MCMAX.GT.0.25D0) GOTO 60 WRITE(6,*) J K=K+1 DO 59 I=1,N W(I,K)=C(I,J) 59 CONTINUE 60 CONTINUE WRITE(6,*) 'NUMBER OF PROMINENT EIGENVECTORS = ',K

3.2

3.3

Computer Code 3 Sample FORTRAN code for generating eigenvectors from a spatial weight matrix

Subroutine DEVCSF calculates the eigenvalues and their associated eigenvectors. The final block of code identifies for which eigenfunctions MCj/MC1 > 0.25.

2.3.5 SAS Code for Generating MCM Eigenvectors The IML language facilitates eigenfunction calculations by SAS. The Computer Code 4 window provides sample SAS code for computing eigenvectors. As with the preceding FORTRAN code, SAS works with the spatial weight matrix as input. The first part of the code constructs matrix M. IML imports both matrix C and matrix M. It employs matrix operations to construction MCM and subroutine EIGEN to compute the eigenfunctions. Finally, it outputs a working file of eigenvalues (STEP3) and working file of eigenvectors (STEP4).

Spatial Autocorrelation and Moran Eigenvector Spatial Filtering

1877

DATA STEP2; ARRAY M{73} M1-M73; DO I=1 TO 73; DO J=1 TO 73; M{J} = -1/73; IF I=J THEN M{J}=1-1/73; END; OUTPUT; END; DROP I J; RUN;

4.1

PROC IML; USE STEP1; READ ALL INTO C; USE STEP2; READ ALL INTO M; IDEN=I(70); CM=C*M; MCM=M*CM;

4.2

4.3

CALL EIGEN(EVALS,EVECS,MCM); CREATE APPEND CREATE APPEND QUIT;

STEP3 FROM EVALS; FROM EVALS; STEP4 FROM EVECS; FROM EVECS;

4.4

Computer Code 4 Sample SAS code for generating eigenvectors from a spatial weight matrix

2.4

Statistical Features of Moran Eigenvector Spatial Filtering

Construction of an ESF begins with determination of a candidate set of eigenvectors, which is a substantially smaller subset of the total set of n eigenvectors. One criterion for identifying this set is a minimum MC of 0.25, which relates to roughly 5% of the variance in a response variable being attributable to SA (see Appendix A). Another criterion devised by Chun et al. (2016) builds on the level of SA detected in a response variable, Y, and the tessellation size being studied. The developed formula suggests that the number of eigenvectors in a candidate set may be calculated as follows: Nc ¼

2 1 þ exp42:1480 

np   6:1808ðzMC þ 0:6Þ0:1742 n0:1298 p

3 þ

3:3534 ðzMC þ 0:6Þ0:1742

(5)

5

where Nc denotes the number of eigenvectors in a candidate set, np denotes the number of all PSA eigenvectors, and zMC denotes the z-score of the MC for the response variable Y (or some transformed version of it, such as a Box-Cox power transformation, if used). Two challenges of ESF construction concern (a) the selection of eigenvectors from a candidate set to include in the constructed variate and (b) the increase in number of candidate and selected eigenvectors with increasing n.

1878

D. Griffith and Y. Chun

To date, eigenvector selection has been undertaken with stepwise regression procedures, an approach plagued by the pretesting problem. One redeeming feature of the eigenvectors is that they are both orthogonal and uncorrelated. Accordingly, a Bonferroni adjustment, for example, furnishes a more appropriate significance level to use with a stepwise procedure. The number of eigenvectors in a candidate set determines this adjustment. A simulation experiment employing iid random normal data illustrates this contention: the selection probability used is p = 0.01; the estimated distribution of selected eigenvectors is exactly that for a binomial distribution with p = 0.01 (see Tables 2 and 3). The Bonferroni adjustment – 0.01/(# candidate eigenvectors) – more or less fully corrects for excess vector selection. In addition, for reasonably sized samples, essentially only one vector might be selected by chance, and its percentage of variance accounted for in the response variable tends to be trivial. Alternatively, the LASSO can furnish another way to select eigenvectors to construct an ESF. For example, Seya et al. (2015) discuss that a LASSO-based ESF construction is faster and compatible with a stepwise-based ESF. One related issue is that stepwise regression tends to lead to an under-estimation of a model’s residual variance, especially when a large number of independent variables are chosen. MESF can be affected by this issue when a level of SA is very high, and hence many eigenvectors are selected. Tibshirani (2005) proposes the addition of a search degrees of freedom (sdf) to correct the reported degrees of freedom for a best subset selection model. Utilization of sdf is expected to help correct any potential under-estimation of MESF model variance. Chun and Griffith (2014) discuss that ESF estimates are unbiased, efficient, and consistent. They analytically show that ESF estimates are unbiased for data generated with a spatial error (i.e., SAR) model specification process. Also Pace et al. (2013) confirm that an ESF produces unbiased estimates of covariate regression parameters for the same process. They also find that for synthetic data generated with a spatial lag model specification, an ESF specification tends to result in some bias. Nevertheless, they conclude that an ESF reduces bias found in an ordinary least squares (OLS) solution, thus improving upon it. Chun and Griffith (2014) further show that ESF estimates are relatively efficient compared with OLS and generalized least squares (GLS) estimator solutions. Especially, the ESF estimator increasingly is more efficient than either the GLS or OLS estimators as the level of SA increases. They also show that ESF estimators, because they utilize orthogonal eigenvectors, are consistent in the context of linear regression.

3

Comparisons with the Spatial Lag Term

Conceptualization of the ESF model specification is based upon a spectral decomposition of a spatial weight matrix. The standard SAR specification furnishes the following approximation:

30-by-30

20-by-20

10-by-10

P-by-P 6-by-6

0.25 0.50 0.75 0.25 0.50 0.75 0.25 0.50 0.75 0.25 0.50 0.75

MC MC1

# candidate vectors 12 7 3 31 18 9 123 74 32 276 163 76

Number of selected eigenvectors 0 1 2 3 4 8891 1014 91 4 9337 623 38 2 9695 297 8 7143 2330 442 78 5 8266 1561 161 11 1 9093 863 44 2682 3419 2332 996 412 4660 3467 1359 404 93 7120 2427 397 51 5 567 1512 2181 2149 1605 1840 3048 2613 1490 658 4666 3548 1342 372 63 117 15

2

1028 242 9

5

551 88

32 2

6

8

253 18

7

2

93 2

8

35 1

9

Table 2 Frequency and other summary results of eigenvector selection simulation experiments, with 10,000 replications

17

10

8

11

1

14

Maximum R2 0.34 0.31 0.26 0.19 0.14 0.10 0.09 0.06 0.04 0.05 0.04 0.03

Spatial Autocorrelation and Moran Eigenvector Spatial Filtering 1879

1880

D. Griffith and Y. Chun

Table 3 Summary results of Bonferroni adjusted eigenvector selection simulation experiments, with 10,000 replications Binomial model estimation

P-by-P 6-by-6

MC MC1

0.25 0.50 0.75 10-by-10 0.25 0.50 0.75 20-by-20 0.25 0.50 0.75 30-by-30 0.25 0.50 0.75

Bonferroni adjustment Frequency Pearson of no # candidate chi-square selected ^p vectors deviance Selection p eigenvectors 12 0.01007 1.0605 0.00083 9894 7 0.01007 1.0652 0.00143 9882 3 0.01043 1.0307 0.00333 9893 31 0.01122 1.0819 0.00032 9901 18 0.01067 1.0274 0.00056 9894 9 0.01057 1.0082 0.00111 9906 123 0.01103 1.0800 0.00008 9895 74 0.01062 1.0683 0.00014 9914 32 0.01061 1.0132 0.00031 9888 276 0.01097 1.1050 0.00004 9894 163 0.01059 1.0503 0.00006 9892 76 0.01006 1.0112 0.00013 9879

Maximum R2 0.31 0.29 0.28 0.10 0.14 0.09 0.04 0.04 0.03 0.01 0.01 0.01

Maximum # selected vectors 2 2 2 1 1 1 2 2 1 1 1 1

Y ¼ ρWY þ ð1  ρÞμ1 þ «

(6)

Y ¼ ρD1 CY þ ð1  ρÞμ1 þ «

(7)

where D is a diagonal matrix whose cell (i,i) entry is the ith row sum of matrix C, Y ¼ ρD1 EΛET Y þ ð1  ρÞμ1 þ «

(8)

) Y ¼ EET Y þ μ1 þ ξ

(9)

where E is the n-by-n matrix of eigenvectors and Λ is the corresponding diagonal matrix of eigenvalues. This approximation requires the principal eigenfunction of matrix C to be removed from consideration; in theory, vector 1 attached to the mean parameter μ replaces it. Furthermore, the remaining n1 eigenvectors are asymptotically uncorrelated (see Griffith 2003). A more direct ESF model specification is based upon the spatial lag model specification: Y ¼ μ1 þ ðI  ρCÞ1 «

(10)

MY ¼ μM1 þ MðI  ρCÞ1 «

(11)

where M = (I  11T/n), is the standard projection matrix used to center a vector of data values,

Spatial Autocorrelation and Moran Eigenvector Spatial Filtering

1881

MðI  ρCÞMY ¼ 0 þ «

(12)

MY ¼ ρMCMY þ «

(13)

where matrix MCM appears in the numerator of the MC, Y ¼ ρMCMY þ y1 þ «

(14)

) Y ¼ EET Y þ μ1 þ ξ:

(15)

In other words, an ESF model specification approximates a spatial autoregressive model specification by approximating the included spatial lag variable. Removing those eigenvectors for which ETj Y0 reduces the covariate matrix E used in a substitution for vector WY from n-by-n to n-by-k, resulting in the general specification Y ¼ Ek β þ μ1 þ ξ

(16)

where β is a k-by-1 vector of regression parameters. This particular general specification has the regression parameter estimates  1 β^ ¼ ETk Ek ETk Y ¼ ETk Y

(17)

Y ¼ EET Y þ μ1 þ ξ:

(18)

yielding

This reduction in matrix dimension can be achieved with stepwise regression techniques, which tremendously benefit from the mutual orthogonality and uncorrelatedness of the eigenvectors of modified matrix MCM (two properties asymptotically achieved for matrix C itself). From standard linear regression theory, the individual parameter estimate variances are given by the diagonal elements of matrix 

1T ETk



1 Ek

 1

σ2ε

¼

! 1 0 σ2ε n 0 Ik

(19)

which furnishes the variance term used in the preceding estimator efficiency assessment. The preceding discussion raises the question of how well a set of selected eigenvectors approximates the spatial lag variate WY. Figure 3 portrays the prediction of WY by judiciously selected eigenvectors for the eight-specimen geospatial datasets in Chun and Griffith (2014). An ESF description of vector Y for these examples requires between 7% and 22% of the total number of eigenvectors. In contrast, an ESF description of vector WY for these examples requires between 9% and 24% of the total number of eigenvectors, only a very slight increase in the number needed for describing variable Y.

1882

a

D. Griffith and Y. Chun

wy 60

wy

b 0.7 0.6

50

0.5 40

0.4

30

0.3 0.2

20

0.1

10 −30

c

−20

−10

0 10 20 Predicted Value of wy

30

wy 7

0.0 −0.4 −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 0.4 Predicted Value of wy d wy 5.0

6 4.5 5 4.0

4 3

3.5 2 3.0 −1.0

1 −3

−2

−1

1 0 Predicted Value of wy

2

3

e

f

wy 3.5

0.5

2.5

0.4

2.0

0.3

1.5

0.2

1.0

0.1

0.5

0.0 −2.0 −1.5 −1.0 −0.5

0.0

0.5

1.0

1.5

−0.1 −0.5 −0.4 −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3

Predicted Value of wy

g

1.0

0.0 0.5 Predicted Value of wy

wy 0.6

3.0

0.0

−0.5

Predicted Value of wy

wy 35

h

wy 4 3

30

2

25

1 20

0

15

−1 −2

10 −15

−10

−5

0 5 10 Predicted Value of wy

15

−4

−3

−2 −1 0 1 Predicted Value of wy

2

3

Fig. 3 Scatterplots of observed versus ESF-predicted WY vectors for the eight geographic landscapes listed in Chun and Griffith (2014)

Spatial Autocorrelation and Moran Eigenvector Spatial Filtering

1883

In summary, an ESF furnishes a dimension reduction substitution for the spatial lag variate WY in a spatial statistical model specification. Because the eigenvectors are fixed for a given matrix C, this dimension reduction greatly simplifies accounting for SA in regression models.

4

An Empirical Example: An Moran Eigenvector Spatial Filtering Methodology Example

The 2007 farm densities in Puerto Rico (Y) were analyzed with conventional linear regression techniques. These densities are calculated by dividing the number of farms in by the area size of a municipality. Because the statistical distribution of these farm densities is highly skewed, they were subjected to a Box-Cox power transformation: (Y  0.12)0.38. The Shapiro-Wilk diagnostic test indicates that the transformed variable adequately mimics a normal distribution ( p value = 0.5688). A benchmark bivariate linear regression model specification includes mean annual rainfall as a covariate variable. This specification allows a comparison between its parameter estimates and those for an ESF model. Table 4 summarizes estimation results for the bivariate and ESF linear regression models. The residual MC tests indicate that the linear regression results contain significant SA (the MC z-score is 6.2299), whereas the ESF model successfully accounts for all but trace SA (the MC z-score is 0.8873). A change in both regression coefficient (i.e., intercept and slope) estimates and their standard errors accompanies the change in model specification. The ESF model produces a larger estimate (0.0210 vis-à-vis 0.0138) for the covariate regression slope coefficient. Its significance within the context of the ESF model also is higher. Finally, the percent of variance accounted for (R2) increases considerably with inclusion of the eight selected eigenvectors, which partly is due to an increase in the number of covariates: the mean annual rainfall variable alone explains 13%, whereas SA as captured by the ESF term explains an additional 60%, of the variation in Y. Figure 4 portrays the decomposition of the 2007 farm densities based on the ESF model. Figure 4a is the map of the transformed farm densities. Figure 4b depicts the trend attributable to mean annual rainfall. This trend map illustrates that rainfall effectively explains the high values at the center of the island and in the eastern coastal areas. Figure 4c portrays a spatially structured random component as Table 4 The results of linear regression for farm densities in Puerto Rico in 2007

Model features Intercept Rainfall # of selected eigenvectors R2 MC (z-score of MC)

Basic model Regression coefficient 0.6622 0.0138 – 0.1298 0.4303 (6.2299)

Significance codes: (*: 0.05, **: 0.01, ***: 0.001)

Standard error 0.3009* 0.0042**

ESF model Regression Standard coefficient error 0.1623 0.2235 0.0210 0.0032*** 8 0.7283 0.1644 (0.8873)

1884

D. Griffith and Y. Chun

a

under 1.26 1.26 - 1.43 1.43 - 1.54 1.54 - 1.69 over 1.69

b

under 1.17 1.17 - 1.41 1.41 - 1.74 1.74 - 2.14 over 2.14

c

under −0.39 −0.39 - −0.02 −0.02 - 0.14 0.14 - 0.33 over 0.33

d

under −0.29 −0.29 - −0.13 −0.13 - −0.02 −0.02 - 0 0 - 0.1 0.1 - 0.3 over 0.3

Fig. 4 Decomposed maps of 2007 farm densities in Puerto Rico: (a) dependent variable, (b) trend latent in the mean annual rainfall covariate, (c) spatially structured random component, and (d) random residuals

Spatial Autocorrelation and Moran Eigenvector Spatial Filtering

1885

captured by a linear combination of the eight selected eigenvectors. This map especially highlights the spatial clusters of low values in the northeast and of high values in the southwest parts of the island, indicating that they are well described by SA map pattern components. Figure 4d visually confirms that the residuals of the ESF model lack a significant level of SA.

5

Extensions to Spatial Interaction Data Analysis

Spatial interaction can be conceptualized in a way that is analogous to gravity in Newtonian physics. The simple gravity model motivated its doubly constrained version containing origin and destination balancing factors (Wilson 1967). The balancing factors ensure that origin and destination totals are preserved. The factors may be interpreted as origin emissivity (i.e., a competitive accessibility measure vis-à-vis destinations with respect to origin i) and destination attractiveness (i.e., a competitive accessibility measure vis-à-vis origins with respect to destination j). This in turn motivated Poisson regression model estimation of its parameters (Flowerdew and Aitkin 1982). This Poisson probability model specification led to the use of origin and destination indicator variables [a separate indicator variable is included for each origin and each destination – that is, 2n 0–1 binary variables, each having n 1s and n(n  1) 0s], whose sets of coefficients are equivalent to the logarithm of the balancing factors when amalgamated. To avoid multicollinearity, coefficients for two of these indicator variables – one for the origin and one for the destination set – are set to 0, resulting in 2n – 2 binary variables. Contemporary research again is focusing on the role SA plays in this model (Griffith 2011; Fischer and Griffith 2008; Griffith et al. 2017): correlation exists among flows originating near origin areal unit i and terminating at destination areal unit j. Each of the n2 geographic flows tends to be positively correlated with its origin and destination sizes and negatively correlated with the extent of the intervening distance. The following simple equation furnishes a very good description of this phenomenon (see Griffith 2011): Fij  κeSFOi Ai Oi eSFDj Bj Dj eγdij

(20)

where Fij κ SFOi Ai Oi SFDj

denotes the flow (e.g., number of workers for journey-to-work commutes) between locations i and j, is a constant of proportionality, denotes the origin i ESF accounting for SA in flows, denotes an origin balancing factor, denotes the total amount of flow leaving from origin i (e.g., number of workers residing in an origin for journey-to-work commutes), denotes the destination j ESF accounting for SA in flows,

1886

Bj Dj dij γ

D. Griffith and Y. Chun

denotes a destination balancing factor, denotes the total amount of flows arriving to destination j (e.g., the number of jobs available in a destination for journey-to-work commutes), denotes the distance separating origin i and destination j, denotes the global distance decay rate.

Selected results from the estimation of Eq. (20) for the Puerto Rican 2000 journeyto-work data (874,832 inter-municipality trips for 732 dyads) include the following: κ^

γ^

9.4  10

0.1625

SFOi ¼ 0, SFDj ¼ 0

5.6  106

0.2286

7.98012

0.9825

0.2084

2

0.9892

None

6

6

5.1  10

Overdispersion 14.52272

Pseudo-R2 0.8039

Set values SFOi ¼ 0, SFDj ¼ 0, Ai = 1, Bj = 1

6.4750

The ESF comprises 85 of 121 candidate eigenvectors (those with an MC of at least 0.25), from a total of 5,329 possible eigenvectors. The uncovered SA in these flows contributes to excess Poisson variation. Adjusting for SA in these flows yields a better alignment of the largest predicted and observed values (Fig. 5). Figure 6 portrays the balancing factors and the ESF for the Puerto Rican journeyto-work example. Figures 6a, b, the Ai and Bj values, display conspicuous geographic patterns. Meanwhile, the origin ESF (Fig. 6c) contrasts the San Juan metropolitan region with the remainder of the island, whereas the destination ESF (Fig. 6d) highlights the four urban catchment areas (San Juan-Caguas, Arecibo, Mayaguez, and Ponce).

6

Extensions to Space-Time Data Analysis

MESF can be extended to model space-time data that basically have a panel data structure. That is, observations are measured at multiple time points across all areal units. A possible interweaving of SA and temporal autocorrelation can make appropriately describing such space-time data more complicated. An MESF extension has been developed in two ways. The first is to consider that spatial and temporal autocorrelation components arise from common random effects in space-time. This approach utilizes linear (or generalized linear) mixed model approaches. The second is to accommodate the structure of space-time autocorrelation in a space-time weight matrix. The first approach begins with a standard linear mixed model that can be expressed as follows: Yi,t ¼ xi,t β þ zi þ εi,t

(21)

where zi is the ith value in a random effects term, Z, constituting a common factor spanning the time period under study. The area-specific random effects term, with a

Spatial Autocorrelation and Moran Eigenvector Spatial Filtering

1887

Fig. 5 Scatterplots of the Eq. (20)-predicted and observed journey-to-work trips. (a) SFOi ¼ 0, SFDj ¼ 0, Ai = 1, and Bj = 1. (b) SFOi ¼ 0 and SFDj ¼ 0. (c) all parameters estimated

dimension of n-by-1, often is assumed to follow N ð0, σ2 IÞ . When SA is not appropriately accounted for (e.g., missing variables), Z is likely to be spatially autocorrelated. Using MESF with eigenvectors, Z can be decomposed into spatially structured (Zssre) and spatially unstructured (Zsure) random effects components. This extension supports investigating SA in terms of a common factor across a time period. For example, Hu et al. (2018) analyze lung cancer rates in seven metropolitan areas in Florida utilizing this approach when investigating whether or not both PSA and NSA components exist in these cancer rates. The second approach specifies a space-time dependency structure in a space-time weight matrix, exploiting the extension of the standard MC to a space-time

1888

D. Griffith and Y. Chun

a

−5.611 - −4.611 −4.611 - −3.793 −3.793 - −3.438 −3.438 - −2.455 −2.455 - −0.427 0.488 - 0.488 1.664 - 1.664

b

−2.364 - 0.102 0.102 - 1.432 1.432 - 2.775 2.775 - 3.659 3.659 - 4.443 5.277 - 5.277 5.651 - 5.651

c

−0.778 - −0.373 −0.373 - −0.146 −0.146 - −0.06 −0.06 - −0.036 0.036 - 0.221 0.437 - 0.437 1.396 - 1.396

d

−0.89 - −0.225 −0.225 - −0.11 −0.11 - −0.005 −0.005 - −0.08 0.08 - 0.166 0.356 - 0.356 1.914 - 1.914

Fig. 6 Geographic distributions of Eq. (20) terms. (a) origin balancing factor (Ai). (b) destination balancing factor (Bj). (c) origin spatial filter. (d) destination spatial filter

Spatial Autocorrelation and Moran Eigenvector Spatial Filtering

a

1889

b

Fig. 7 Space-time dependence structures. (a) Time-lagged spatial dependence. (b) Contemporaneous spatial dependence

MC. Griffith (2012) discusses two popular conceptualizations: the lagged and contemporaneous spatial dependence structures that are illustrated in Fig. 7 with municipalities in Puerto Rico. For the lagged structure (Fig. 7a), Yi,t is associated with Yi,t1 as well as spatially neighboring values at time t  1 (i.e., Yj,t1 where i and j are spatial neighbors). In contrast, for the contemporaneous structure, Yi,t is associated with Yi,t1 as well as Yj,t; that is, spatially neighboring values at time t. Accordingly, a space-time weight matrix (STWM) can be constructed from a spatial weight matrix and a temporal weight matrix as follows: Lagged structure:

M½Ct  ðCs þ Is ÞM

Contemporaneous structure: M½It  Cs þ Ct  Is M

(22) (23)

where Cs denotes a n-by-n spatial weight matrix, and Ct denotes a T-by-T temporal weight matrix that has non-zero values for the upper and lower off-diagonal cells. Is and It are identity matrices, respectively with dimensions n-by-n and T-by-T, and  denotes the Kronecker product matrix operation. Note that here M ¼ Ist  1st 1Tst =ðnTÞ is nT-by-nT dimensional, computed with Ist, a nT-by-nT dimensional identity matrix, and 1st, a nT-by-1 vector of ones. Eigenvectors generated from either Eqs. (22) or (23) can be selected with a stepwise procedure in the same fashion as for ESF construction, and can capture space-time autocorrelation components in the residuals of a space-time regression model; that is, a selected set of such eigenvectors forms an eigenvector space-time filter (ESTF). Griffith (2012) illustrates that an ESTF model specification successfully accounts for space-time autocorrelation in urbanization across Puerto Rico between 1899 and 2000.

1890

7

D. Griffith and Y. Chun

Conclusions

This chapter provides an introductory discussion about the concept of SA together with MESF methodology. MESF furnishes a method to properly analyze a georeferenced variable by effectively separating a spatially structured random component from trend and random noise present in a georeferenced variable. MESF offers several advantages over other spatial data modeling methodologies, including spatial autoregression. First, it allows researchers to model spatially autocorrelated variables with conventional statistical tools, such as linear regression, which otherwise may well produce biased estimates with standard methodology, such as OLS. Second, it provides a way to analytically decompose a variable into underlying components, including a spatial component. Third, it provides a synthetic variate (the ESF) whose mapping visualizes SA contained in a georeferenced variable. This visual representation can lead to a better understanding of geographic phenomena, in part by furnishing a clue about possible variables missing from a linear regression model specification. In addition, MESF methodology produces an enhanced and robust statistical result when compared with alternative model specifications, such as OLS linear regression and spatial autoregression: its parameters appear to be unbiased, relatively efficient, and consistent. Beyond the discussion in this chapter, the MESF methodology also offers the advantage of being able to accommodate PSA in a generalized linear model with discrete response variable, such as Poisson and negative binomial, which cannot be modeled with the traditional auto specifications (Besag 1974). Further, the MESF methodology furnishes technology that can be utilized to account for SA in geographical flows, such as population migration, journey-to-work commutes, and interregional commodity shipments. In other words, it offers a much wider array of tools, in more advanced spatial data analysis settings, than is addressed by the discussion in this chapter. Finally, the MESF methodology can be extended to other spatial analysis problems. Spatial interaction modeling furnishes one such extension. Adjusting for SA in geographic flow descriptions allows a better estimate of the global distance decay parameter, accounts for considerable excess variation characterizing flows, and better aligns the magnitudes of predicted and observed flows. In addition, the MESF methodology furnishes a very flexible framework to account for spatial and temporal autocorrelation simultaneously in a single model specification.

8

Cross-References

▶ Complexity and Spatial Networks ▶ Exploratory Spatial Data Analysis ▶ Spatial Data and Spatial Statistics ▶ Spatial Econometric OD-Flow Models

Spatial Autocorrelation and Moran Eigenvector Spatial Filtering

1891

Appendix A The relationship between the MC and a squared product moment correlation coefficient (r2). MC can be derived as a linear regression solution: From OLS theory: b = (XTX)1XTY (i) (ii) (iii) (iv) (v)

Convert the attribute variable in question to z-scores Let X = zY and Y = CzY Regress CzY on zY, with a no-intercept option Let X = 1 and Y = C1 Regress C1 on 1, with a no-intercept option MC ¼ bnumerator =bdenominator

This relationship links directly to the Moran scatterplot, conveying why it is a useful visualization of SA. Next, let MC = (n/1TC1)zTCz/(n  1) and rewrite vector z as the following bivariate regression model specification: z = a1 + bCZ + e, where e is an n-by-1 vector of residuals. Then pffiffiffi zT Cz sz 1 b¼ T 2 ¼ r¼ r sCz z C z sCz

(A:1)

MC nλ1 1T C1 n  1 1 ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   ffir T MC1 1T C1 n zT C2 z z C I  11T =n Cz

(A:2)



n1

2

ð n  1Þ 2 n1  r2 λ21   ¼ T  T 2 2 T z C I  11 =n Cz z C z     zT C I  11T =n Cz MC 2 2 2 r ¼ λ1 ð n  1 Þ  2 MC1 zT C 2 z MC MC1

(A:3)

(A:4)

where MC1 denotes the maximum value of MC for a given spatial weight matrix C. For a large P-by-Q regular square lattice (i.e., n = PQ) and the rook’s adjacency definition, for which MC1  1, if MC = 0.25, then 1Cz  0

(A:5)

zT C2 z  16ðPQ  1Þ    

π π λ1 ¼ 2 COS þ COS 4 Pþ1 Qþ1

(A:6) (A:7)

1892

D. Griffith and Y. Chun

and, consequently, r2  0.05. Therefore, roughly 5% of the variance in a spatially autocorrelation random variable with MC = 0.25 is attributable to spatial autocorrelation.

References Besag J (1974) Spatial interaction and the statistical analysis of lattice systems. J R Stat Soc Ser B 36(2):192–225 Borcard D, Legendre P (2002) All-scale spatial analysis of ecological data by means of principal coordinates of neighbour matrices. Ecol Model 153(1–2):51–68 Chun Y, Griffith DA (2014) A quality assessment of eigenvector spatial filtering based parameter estimates for the normal probability model. Spatial Statistics 10:1–11 Chun Y, Griffith DA, Lee M, Sinha P (2016) Eigenvector selection with stepwise regression techniques to construct eigenvector spatial filters. J Geogr Syst 18(1):67–85 Cliff AD, Ord JK (1981) Spatial processes: models and applications. Pion, London de Jong P, Sprenger C, van Veen F (1984) On extreme values of Moran’s I and Geary’s c. Geogr Anal 16(1):17–24 Fischer M, Griffith DA (2008) Modeling spatial autocorrelation in spatial interaction data: a comparison of spatial econometric and spatial filtering specifications. J Reg Sci 48(5):969–989 Flowerdew R, Aitkin M (1982) A method of fitting the gravity model based on the Poisson distribution. J Reg Sci 22(2):191–202 Getis A (2010) Spatial autocorrelation. In: Fischer M, Getis A (eds) Handbook of applied spatial analysis. Springer, New York, pp 255–278 Getis A, Griffith DA (2002) Comparative spatial filtering in regression analysis. Geogr Anal 34 (2):130–140 Griffith DA (1992) What is spatial autocorrelation? Reflections on the past 25 years of spatial statistics. l’Espace Géographique 21(3):265–280 Griffith DA (2003) Spatial autocorrelation and spatial filtering: gaining understating through theory and scientific visualization. Springer, Berlin Griffith DA (2006) Hidden negative spatial autocorrelation. J Geogr Syst 8(4):335–355 Griffith DA (2010) The Moran coefficient for non-normal data. J Stat Plann Infer 140 (11):2980–2990 Griffith DA (2011) Visualizing analytical spatial autocorrelation components latent in spatial interaction data: an eigenvector spatial filter approach. Comput Environ Urban Syst 35 (2):140–149 Griffith DA (2012) Space, time, and space-time eigenvector filter specifications that account for autocorrelation. Estadística Española 54(#177):7–34 Griffith DA, Chun Y (2016) Evaluating eigenvector spatial filter corrections for omitted georeferenced variables. Econometrics 4(2):29 Griffith DA, Fischer MM, LeSage J (2017) The spatial autocorrelation problem in spatial interaction modelling: a comparison of two common solutions. Lett Spat Resour Sci 10(1):75–86 Griffith DA, Chun Y, Li B (2019) Spatial regression analysis using eigenvector spatial filtering. Elsevier, Cambridge, MA. ISBN: 9780128150436 Haining R (1991) Bivariate correlation with spatial data. Geogr Anal 23(3):210–227 Hu L, Griffith DA, Chun Y (2018) Space-time statistical insights about geographic variation in lung cancer incidence rates: Florida, 2000–2011. Int J Environ Res Public Health 15(11):2406 Lee M, Chun Y, Griffith DA (2018) Errors propagation in spatial modeling of public health data: a simulation approach using pediatric blood lead level data for Syracuse, New York. Environ Geochem Health 40(2):667–681 Pace K, LeSage J, Zhu S (2013) Interpretation and computation of estimates from regression models using spatial filtering. Spatial Econ Anal 8(3):352–369

Spatial Autocorrelation and Moran Eigenvector Spatial Filtering

1893

Paéz A (2018) Using spatial filters and exploratory data analysis to enhance regression models of spatial data. Geogr Anal,. online first. https://doi.org/10.1111/gean.12180 Patuelli R, Griffith DA, Tiefelsdorf M, Nijkamp P (2011) Spatial filtering and eigenvector stability: space-time model for German unemployment data. Int Reg Sci Rev 34(2):235–280 Seya H, Murakami D, Tsutsumi M, Yamagata Y (2015) Application of LASSO to the eigenvector selection problem in eigenvector based spatial filtering. Geogr Anal 47(3):284–299 Tibshirani RJ (2015) Degrees of freedom and model search. Stat Sin 25(3):1265–1296 Tiefelsdorf M, Boots BN (1995) The exact distribution of Moran’s I. Environ Plann A 27 (6):985–999 Tobler W (1969) A computer movie simulating urban growth in the Detroit region. Paper prepared for the meeting of the international geographical union, commission on quantitative methods, Ann Arbor, Michigan, August; published in 1970. Econ Geogr 46(2):234–240 Wilson A (1967) A statistical theory of spatial distribution models. Transp Res 1(3):253–269

Geographically Weighted Regression David C. Wheeler

Contents 1 2 3 4 5

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Statistical Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Collinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Diagnostic Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Bayesian Spatially Varying Coefficient Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Local Weighted Quantile Sum Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Application: Residential Chlordane Exposure in Los Angeles County . . . . . . . . . . . . . . . . . 10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1896 1897 1899 1902 1902 1902 1903 1904 1906 1907 1907 1910 1911 1919 1920 1920

Abstract

Geographically weighted regression (GWR) was proposed in the geography literature to allow relationships in a regression model to vary over space. In contrast to traditional linear regression models, which have constant regression coefficients over space, regression coefficients are estimated locally at spatially referenced data points with GWR. The motivation for the introduction of GWR is the idea that a set of constant regression coefficients cannot adequately capture spatially varying relationships between covariates and an outcome variable. GWR is based on the appealing idea from locally weighted regression of estimating local models for curve fitting using subsets of observations centered on a D. C. Wheeler (*) Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA e-mail: [email protected]; [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_77

1895

1896

D. C. Wheeler

focal point. GWR has been applied widely in diverse fields, such as ecology, forestry, geography, and regional science. At the same time, published work from several researchers has identified methodological issues and concerns with GWR and has questioned the application of the method for inferential analysis. One of the concerns with GWR is the strong correlation in estimated coefficients for multivariate regression terms, which makes interpretation of map patterns for individual terms problematic. The evidence in the literature suggests that GWR is a relatively simple and effective tool for spatial interpolation of an outcome variable and a more problematic tool for inferring spatial processes in regression coefficients. The more complex approach of Bayesian spatially varying coefficient models has been demonstrated to better capture spatial nonstationarity in regression coefficients than GWR and is recommended as an alternative for inferential analysis. Keywords

Spatial nonstationarity · Spatially varying relationships · Local regression · Spatial interpolation

1

Introduction

Geographically weighted regression (GWR) was proposed in the geography literature by Brunsdon et al. (1996) to allow relationships in a regression model to vary over space. In contrast to traditional linear regression models, where the regression coefficients are constant over space, regression coefficients are estimated locally at spatially referenced data points with GWR. The movement of local regression coefficients away from their global values, where the global values come from a traditional linear regression model, is termed parametric nonstationarity and spatial nonstationarity in the case of spatial processes (Brunsdon et al. 1996). The motivation for the introduction of GWR is the idea that it is unreasonable to assume that a set of constant regression coefficients can adequately capture spatially varying relationships between covariates and an outcome variable. GWR is based on the simple idea of estimating local models using subsets of observations located around a focal point. GWR has as its methodological foundation the nonparametric technique of locally weighted regression, developed in statistics for curve fitting and smoothing applications. In local weighted regression, parameters are estimated using subsets of data proximate to a model estimation point in variable space, where observations in the subset are applied weights that decrease with increasing distance in variable space. The modification proposed with GWR is to use a subset of data proximate to the model estimation location in geographical space in place of variable space. Though the emphasis with traditional locally weighted regression in statistics has been on curve fitting, i.e., predicting or estimating the outcome variable (Cleveland and Devlin 1988), GWR has been presented as a method for

Geographically Weighted Regression

1897

conducting statistical inference on spatially varying relationships, in an attempt to extend the original emphasis on prediction to confirmatory analysis. The use of GWR for inferential analysis has been questioned and criticized; however, it has been suggested that the method is more appropriately used for interpolation of an outcome variable, which is more in harmony with its origins. This chapter reviews the details of specifying and estimating a geographically weighted regression model, summarizes a few important concerns with GWR, presents some diagnostic tools and alternative approaches, and concludes with an illustrative analysis of estimating concentrations of the pesticide chlordane with GWR.

2

Model Specification

In notation, the foundation for GWR is the traditional linear regression model yi ¼ β 0 þ

p1 X

βk xki þ ei

(1)

k¼1

where yi is the normally distributed outcome variable and xki is the value for the kth covariate for observation i, β0 is the intercept, βk is the regression coefficient for the kth covariate, and ei is the random error for observation i. There are p regression coefficients to estimate with the linear regression model. GWR specifies that the regression coefficients vary over observations as yi ¼ β0i þ

p1 X

βki xki þ ei

(2)

k¼1

where there is now an intercept and covariate regression coefficient for each data point. For i = 1, . . . , n observations in the dataset, there are np regression coefficients to estimate because p coefficients are estimated at each of the n observations. It is required that the observations are spatially referenced, i.e., spatial coordinates are known to represent each data point. The observations may be areal units or individual-level data, such as residences. In the case of area data, the centroid of each areal unit is typically used as the basis for the spatial coordinates. As with the linear regression model, it is convenient to express the GWR model in matrix notation yi ¼ Xi βi þ ei

(3)

where Xi is a row vector of explanatory variables and βi is a column vector of regression coefficients at location i. In GWR, spatial structure is specified in the model through applying weights to the data. The weights are applied to the outcome variable and the covariates. The weights are calculated from a kernel

1898

D. C. Wheeler

function that typically assigns more weight to observations that are spatially closer to the data point (ith location) where the model is estimated. The introduction of the weights into the model follows from the assumption of spatial autocorrelation, where observations more proximate in space are thought to be more similar. Spatial autocorrelation ignored in the linear regression model results in spatially correlated errors, a violation of the model assumption of independent and identically distributed errors. One can choose to model spatial autocorrelation either through the error term or through the regression coefficients, which is the GWR approach. The kernel function used to calculate the weights in the GWR setting takes as input distances between all locations, conveniently in the form of a distance matrix. The kernel function has a bandwidth parameter that determines the spatial range of the kernel. The bandwidth parameter must be selected a priori or estimated from the data. The function returns a weight between locations that are inversely related to distance. A number of different kernel functions can be used in GWR. There are two general types of kernel functions, adaptive and fixed. Adaptive kernel functions inherently adjust for the density of data points by using a bandwidth expressed in number of observations. This results in a spatially larger kernel in data-sparse areas and a smaller kernel in data-dense areas. Fixed kernel functions have a bandwidth expressed as a constant distance; hence, the kernel is the same spatial size regardless of the density of data points. This could result in a varying number of observations weighted in kernels across the study area if the kernel function has weights that are zero beyond a certain distance. Some of the most popular fixed kernel functions applied within GWR are continuous functions that produce weights that monotonically decrease with distance, such as the Gaussian or exponential kernel functions. The Gaussian kernel function is   ! 1 d ij 2 wij ¼ exp  2 γ

(4)

where wij is the weight for data at location j in the model estimated for location i; dij is the distance between locations i and j; γ is the kernel bandwidth, a distance, that controls the decay and range of spatial correlation; and exp() is the exponential function. For matrix multiplications in the calculations of the model parameter estimates in GWR, the n weights for each model calibration location i, in row vector wi, are placed in a n  n weights matrix W. The simpler exponential kernel function is 

d ij wij ¼ exp  γ

 (5)

which removes the scaling and powering of the Gaussian function. Another fixed kernel function is the bi-square kernel function

Geographically Weighted Regression

wij ¼

8h < :

1899

 i2 1  d 2ij =γ 2 0

if

d ij  γ

if

d ij > γ

(6)

where the weight wij = 0 if the interpoint distance exceeds the kernel bandwidth γ. Hence, this function is continuous until a distance threshold is reached and then is constant (zero) beyond the threshold. A similar kernel function is the tricube function

wij ¼

8 h < :

 i3 1  d 3ij =γ 3

if

d ij  γ

0

if

d ij > γ:

(7)

One of the more popular adaptive kernel functions is the bi-square nearest neighbor kernel. The function is (h  2 i 2 1  d ij =d iN wij ¼ 0

if j is one of the N th nearest neighbors of i otherwise

(8)

where diN is the distance to the Nth nearest neighbor of location i and the number N of spatially nearest neighbors to use in the kernel function is estimated from the data. This function assigns a nonzero weight that decays with distance to points within the threshold number of neighbors and a weight of zero to points that are beyond the distance to the Nth nearest neighbor. Given the several options for a kernel function, one must first select the form of the kernel function before estimating the GWR model parameters, including the kernel bandwidth. Conventional thinking from the statistical nonparametric literature holds that the selection of the functional form for the kernel is less important than the selection of the kernel bandwidth for the model estimation results. While this thinking is likely appropriate for GWR, a systematic assessment of the relative performance of different kernel functions in GWR has not been reported. Instead, most research has assumed a kernel function and focused on a criterion to select the kernel bandwidth.

3

Model Estimation

There are two methods for estimating the kernel bandwidth in GWR, cross-validation or minimizing the modified Akaike Information Criterion (AIC, Akaike 1973). Of the two approaches, cross-validation appears more commonly used, likely due to its heavy use in other related areas of statistics, such as local regression and statistical learning, and its conceptual simplicity. Cross-validation is an iterative process that searches for the kernel bandwidth that minimizes the prediction error of all the observed outcome values using a subset of

1900

D. C. Wheeler

the data for prediction. The kernel bandwidth, denoted here generally as γ, is estimated in cross-validation by finding the γ that minimizes the cross-validation (CV) score. The sum of CV errors and the root mean squared prediction error (RMSPE) have been used as the CV score. The kernel bandwidth that minimizes the sum of CV errors, denoted by γ^, is defined as γ^ ¼ arg min γ

n h X

yi  y^ðiÞ ðγ Þ

i2 (9)

i¼1

where y^ðiÞ is the predicted value of observation i with calibration location i left out of the estimation dataset. The kernel bandwidth minimizing the RMSPE is sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n h i2 1X γ^ ¼ arg min yi  y^ðiÞ ðγ Þ : n i¼1 γ

(10)

This form of cross-validation is known as leave-one-out because only one observation is removed from the dataset for each local model when estimating the kernel bandwidth. The data point i is removed when estimating yi to avoid estimating it perfectly. There are several search routines available for finding the optimal kernel bandwidth, including the golden search and the bisection search. Alternatively, one may systematically evaluate the CV score over a range of reasonable possible kernel bandwidths. In the kernel functions described above, the kernel bandwidth is a global parameter and is applied to all local models individually. In contrast to cross-validation, the corrected AIC approach to estimating the kernel bandwidth is based on minimizing the estimation error of the outcome variable, not on the prediction of the outcome variable. The corrected AIC in GWR is adopted from locally weighted regression. The AIC is a compromise between model complexity and goodness-of-fit of the model, as there is a penalty for the effective number of parameters in the model. The corrected AIC for GWR is  AICc ¼ 2nlogðσ^Þ þ nlogð2π Þ þ n

n þ traceðHÞ n  2  traceðHÞ

 (11)

where σ^ is the estimated standard deviation of the error, H is the hat matrix, and the trace of a matrix is the sum of the matrix diagonal elements. The kernel bandwidth is used in the calculation of H and σ^. Row i of the hat matrix is defined as  1 Hi ¼ Xi XT Wi X XT Wi

(12)

which can also expressed as Hi ¼ Xi Ai :

(13)

Geographically Weighted Regression

1901

The estimated error variance is σ^2 ¼

n X

    ðyi  y^i Þ2 = n  2traceðHÞ  trace HT H :

(14)

i¼1

To estimate the kernel bandwidth using the AIC, one can either use a search algorithm or evaluate the AIC over a range of possible bandwidth values to find the bandwidth that minimizes the AIC. The second approach is commonly used to show the relationship between the AIC and the kernel bandwidth. The GWR estimates for the outcome variable values, y^i , are calculated by y^i ¼ Χi β^i

(15)

where β^i is a column of estimated regression coefficients at location i. The vector of estimated regression coefficients at one location is

1 β^i ¼ XT Wi X XT Wi Y

(16)

where X = [X1T; X2T; . . . ; XnT]T is the design matrix of covariates and leading column of ones for the intercept, Wi = diag [wi1, . . . , win] is the n  n diagonal weights matrix calculated for each location i, Y is the n  1 vector of outcome  T variables, and β^i ¼ β^0i , β^1i , . . . , β^p1i is the vector of p local regression coefficients

at location i for p  1 explanatory variables and the intercept. The weight matrix Wi must be calculated at each location using the kernel function and bandwidth before the local regression coefficients can be estimated. The predictions of the outcome variable values y^ðiÞ in cross-validation are calculated similarly, but with the element wii  0 to effectively remove the ith observation from consideration in the model to predict yi. Given the definition of the estimated regression coefficients, GWR can be viewed as a locally weighted least squares regression model where the weights associate pairs of data points. In previous studies using GWR, researchers have mapped estimated regression coefficients in attempts to interpret the spatial pattern of the coefficients in the context of the research problem (Brunsdon et al. 1996; Wheeler and Tiefelsdorf 2005). Researchers have typically been interested in where the estimated regression coefficients are statistically significant, according to some prespecified significance level. In the frequentist setting of traditional GWR, statistical significance tests of the coefficients use the variance of the estimated regression coefficients. According to Fotheringham et al. (2002, p. 55), the variance of the regression coefficients is h i Var β^i ¼ Ai ATi σ^2 :

(17)

Technically, this equation is incorrect because the Fotheringham et al. (2002) version of GWR is not a formal statistical model with kernel weights that are

1902

D. C. Wheeler

specified as part of the errors. The equation used for the local coefficient covariance is only approximate when using cross-validation because the kernel weights are calculated from the data first, before the regression coefficients are estimated from the data. The kernel weights are inherently a function of the outcome variable, as are the regression coefficients, and the correct expression for the coefficient covariance would be nonlinear.

4

Implementation

Several implementations of GWR are freely available for the R-computing environment, including but not limited to the packages spgwr written by Roger Bivand and Danlin Yu and gwrr written by the author. The package spgwr contains functions for estimating GWR model parameters by minimizing the AIC or using cross-validation with several possible kernel functions, including a Gaussian kernel, a bi-square kernel, and a tricube kernel. The gwrr package has functions for estimating GWR model parameters by cross-validation with exponential and Gaussian kernel functions and a bisection search routine. The gwrr package also has a function to diagnose collinearity with GWR models and functions to estimate penalized versions of GWR, known as geographically weighted ridge regression and the geographically weighted lasso, to dampen collinearity effects.

5

Issues

5.1

Statistical Inference

Though the introduction of GWR has provided an approach to investigate regression relationships that may vary over space, there are several critiques of the method. A central issue is a lack of formal statistical inference. GWR lacks a unified statistical framework, as it is effectively an ensemble of local spatial regressions, where the dependence between regression coefficients at different data locations is not specified in the model. This is a fixed effects model with no pooling across estimates. A consequence of a lack of a formal statistical model is that the standard error calculations in GWR are only approximate. This fact is due to reusing data for parameter estimation at multiple locations and using the data to estimate first the kernel bandwidth through cross-validation and then the regression coefficients (Wheeler and Calder 2007). The implication of the approximate standard errors is that the confidence intervals for estimated GWR coefficients are only approximate and should not be considered exactly reliable for detecting statistically significant covariate effects. Another issue for inference on regression relationships with GWR is with the nature and amount of spatial variation in the estimated coefficients, i.e., nonstationarity. Tests for significant spatial variation in the estimated coefficients for one term in a GWR model have been proposed by Fotheringham et al. (2002) and

Geographically Weighted Regression

1903

Leung et al. (2000a). However, the tests do not consider the source of the spatial variation observed in the coefficients. There is concern that variation in the pattern of estimated coefficients may be artificially introduced by the smoothing methodology in GWR and may not represent true nonstationarity in the regression effects (Wheeler and Tiefelsdorf 2005). In other words, nonstationary regression effects could be an artifact of the methodology. Additionally, regression coefficient variability in GWR could result from collinearity effects. In this light of uncertain statistical inference, GWR is more appropriately viewed as an exploratory approach and not a formal model to infer parameter nonstationarity. This view conflicts with the broad application of GWR as an inferential method. Instead of formal statistical inference on spatially varying regression effects, GWR is perhaps better suited to estimation and prediction of an outcome variable. This use would be more congruous with the theoretical origins of GWR in local linear regression, which was developed to estimate a response variable locally. GWR has produced favorable results in estimating a dependent variable compared with other interpolation techniques (Páez et al. 2008). Another argument for using GWR as a local estimator of a response variable is that when interpolation of an outcome variable over space is the main interest, regression coefficient estimation issues in GWR, such as collinearity, are no longer a major concern.

5.2

Collinearity

An issue that can interfere with statistical inference in linear regression models generally is collinearity. Collinearity is the presence of linear dependencies in the design matrix of a regression model, resulting in redundant information in the design matrix and an ill-conditioned variance matrix. Some of the negative consequences of collinearity are overestimates of covariate effect magnitudes, coefficient sign reversals, inflated variances for regression coefficients, and strong correlation in two or more estimated regression coefficients, all of which are likely to lead to incorrect interpretations of relationships in the regression model. These symptoms of collinearity have all been observed with GWR models for either simulated or actual datasets (Wheeler and Tiefelsdorf 2005; Waller et al. 2007; Griffith 2008; Finley 2011). The most conspicuous result of estimated GWR models pointing to collinearity effects in many studies has been strongly correlated regression coefficients for pairs of regression terms, including the intercept, evident in maps or scatter plots of estimated coefficients. Particular concern about collinearity symptoms is warranted with GWR, as collinearity has been found in empirical work to be an issue in local GWR models when it is not present in the traditional linear regression model with the same data (Wheeler 2007). Wheeler and Tiefelsdorf (2005) show through a simulation study that although GWR coefficients can be correlated when there is no correlation in explanatory variables, the coefficient correlation increases systematically with increasingly stronger collinearity.

1904

6

D. C. Wheeler

Diagnostic Tools

There are several established diagnostic tools that have become an essential part of the practice of model fitting for traditional linear regression models, including methods to check for collinearity, influential observations, and autocorrelation. Use of a more complex regression model, such as GWR, should also be complemented with diagnostic tools. Methods to identify spatial residual autocorrelation in GWR models have been developed by Leung et al. (2000b) and by Páez et al. (2002). A limitation of these approaches is that they are not model based, remembering that the GWR method is a collection of local models that are not part of a unified framework. As a result, it is not clear that the source of autocorrelation can be identified. Farber and Páez (2007) proposed a method to adjust for influential observations in the cross-validation of GWR. The need for diagnostic tools for collinearity in GWR models is motivated by several examples in the literature of the presence of redundant information in sets of estimated GWR coefficients from models built for different datasets (Wheeler and Tiefelsdorf 2005; Waller et al. 2007; Griffith 2008). Local collinearity in the GWR model can cause strong correlation in pairs of estimated regression coefficients, as is a consequence of collinearity in the traditional linear regression model. Based on existing published findings, researchers should strongly consider using diagnostic tools for collinearity when estimating GWR coefficients. Conveniently, there are diagnostic tools available to determine if there are substantial collinearity effects present in a GWR model. Simple tools to detect collinearity effects in GWR models include scatter plots of regression coefficients for pairs of regression terms, maps of approximate local regression coefficient correlations (Wheeler and Tiefelsdorf 2005), and local variance inflation factors (VIFs) (Wheeler 2007). The VIF measures how much the estimated variance of a regression coefficient is increased by collinearity. A limitation of the VIF as a diagnostic tool is that it does not consider collinearity with the intercept. More advanced and recommended tools are the variance-decomposition proportions and the associated condition indexes (Belsley 1991; Wheeler 2007). An advantage of the variance-decomposition approach over the VIF is that it measures and conveys the nature of the collinearity among all regression terms at the same time, including the intercept. As motivation for using these tools, applying them to a GWR model to explain crime rates in Columbus, OH clearly linked local collinearity to strong GWR coefficient correlation and increased coefficient variation for two economic status covariates at numerous data locations with counterintuitive positive regression coefficient signs (Wheeler 2007). The variance-decomposition proportion and condition index diagnostic tools introduced by Belsley (1991) and modified for GWR by Wheeler (2007) use singular value decomposition of the GWR kernel-weighted design matrix to calculate variance-decomposition proportions and condition indexes of the coefficient covariance matrix. The variance-decomposition proportion is the percentage of the variance of a regression coefficient that is explained with any one component of the variance matrix decomposition. The condition index is the ratio of the largest singular value and a smaller singular value of the decomposition. Each variance-

Geographically Weighted Regression

1905

decomposition proportion has an associated condition index. The singular value decomposition of the design matrix in the GWR model is Wi 1=2 X ¼ UDVT

(18)

where U and V are orthogonal n  p and p  p matrices respectively, and D is a p  p diagonal matrix of singular values of Wi1/2X, starting at matrix element (1,1) and decreasing in value down the diagonal. The matrix Wi1/2 is the square root of the diagonal weight matrix for calibration location i using a kernel function with the estimated kernel bandwidth from the GWR model. By way of the decomposition, the local variance-covariance matrix of the regression coefficients is   Var β^i ¼ σ 2 VD2 VT :

(19)

The variance of the local kth regression coefficient is p X   v2kj Var β^ik ¼ σ 2 e2 j¼1 j

(20)

where the vkj’s are the elements of the V matrix and the ej’s are the singular values. The variance-decomposition proportion for the local kth regression term and the jth component of the decomposition is π jk ¼

ϕkj ϕk

(21)

where ϕkj ¼

v2kj e2j

(22)

and ϕk ¼

p X

ϕkj :

(23)

j¼1

The condition index for variance component j = 1, . . . , p is ηj ¼

emax ej

(24)

where emax is the largest singular value. Belsley (1991) introduced guidelines for using the variance-decomposition proportions and condition indexes in the traditional linear regression setting. Through

1906

D. C. Wheeler

experimentation results, Belsley (1991) suggests a conservative value of 30 as a threshold for a condition index which indicates collinearity, although the threshold could be as low as 10 if there are large variance-decomposition proportions for two or more regression terms for the same variance component. In general, larger condition indexes suggest stronger collinearity. A guideline for the variancedecomposition proportions is that the presence of two or more variance decomposition proportions greater than 0.5 for the same variance component indicates that collinearity exists between those regression terms. It appears reasonable to apply these guidelines for diagnosing collinearity in a GWR model; however, the guidelines have not been systematically studied in the GWR setting. In ArcMap 10.3 software, a condition index above 30 is used to indicate local collinearity issues with the GWR model. The condition index and variance-decomposition proportion diagnostic tools reveal collinearity locally for the individual GWR models and consequently enable researchers to construct plots of the diagnostic values and link them directly to the estimated GWR coefficients for visual analysis of any collinearity problems present in the model. Estimated GWR coefficients from local models that are diagnosed as problematic should be interpreted with severe caution and additional analysis should be carried out in these areas to better understand the nature of the relationships being modeled. The variance-decomposition proportions and condition index tools are implemented in the freely available R package gwrr.

7

Extensions

Several different models have been introduced to extend the concept of geographically weighted regression. Some of the extensions address the issue of collinearity. Collinearity effects in linear regression models have been dampened by constraining the amount of variation in regression coefficients. Two extended versions of GWR have been proposed that are based on the coefficient shrinkage models of ridge regression and the lasso: geographically weighted ridge regression (GWRR; Wheeler 2007) and the geographically weighted lasso (GWL; Wheeler 2009). The methods limit the amount of variation in the coefficients through the addition of a constraint, or penalty, on the size of the regression coefficients. Ridge regression coefficients minimize the sum of a penalty on the residual sum of squares and the size of the squared coefficients, β^ ¼ argmin R

β

8 n 0 and φ ¼ σ 2u =σ 2ν. The spatial interaction implied by Eq. (16) is even more limited than in the moving average model as it only concerns neighbors of the first and second order contained in the nonzero elements of WW 0. Heteroskedasticity is also implied in this specification.

2.4.4 Direct Representation and Non-Parametric Specifications In this case, the covariance between each pair of error terms is directly specified as an inverse function of the distance between them: cov(ei, ej) = σ 2f(θ, dij) where dij is the distance between i and j, σ 2 is the error variance and f is the distance function. This

2126

J. Le Gallo

function is a distance decay function that should ensure positive-definite variancecovariance matrix. This imposes constraints on the functional form, the parameter space, the metric and scale used for the distance measure. For instance, one might use a negative exponential distance decay function: E ðee0 Þ ¼ σ 2 ½I N þ γΨ

(17)

where the off-diagonal elements of Ψ are given by Ψ ¼ eθdij with θ being a non-negative scaling parameter. The diagonal elements of Ψ are set to zero. In contrast to the previous specifications, the direct representation does not induce heteroskedasticity. An alternative to parametric specifications is to leave the functional form unspecified: these are non-parametric models. We then have cov(ei, ej) = f(dij) where dij is a positive and symmetric distance metric. The regularity conditions on the distance metric have been derived by Conley (1999). The presence of spatial error autocorrelation is often interpreted as a problem in the model specification, such as functional form problems or spatial autocorrelation resulting from a mismatch between the spatial scale of the phenomenon being studied and the spatial scale at which it is measured.

2.5

Encompassing Specifications

An encompassing specification to the spatial lag model, the spatial cross-regressive model and the spatial error model is the unconstrained spatial Durbin model. The latter contains a spatially lagged endogenous variable and the spatial lags of all exogenous variables. More specifically, it is written as: y ¼ λWy þ X β þ WX δ þ u:

(18)

The spatial lag model, the spatial cross-regressive model and the spatial error model are found with the appropriate constraints on the parameters, respectively H0: δ = 0, H0: λ = 0 and H0: λβ  δ = 0. LeSage and Pace (2009) provide several motivations for a spatial Durbin model. One is an omitted variable motivation. Indeed, they show that if the linear regression model (1) is affected by an omitted variables problem and if these omitted variables are spatially correlated and correlated with the included explanatory variables, then unbiased estimates of the coefficients associated with the endogenous variables X can still be obtained by fitting a spatial Durbin model. Other motivations detailed in LeSage and Pace (2009) are based on spatial heterogeneity and model uncertainty. An even more general model has been suggested by Elhorst (2010) and Halleck Vega and Elhorst (2015): the general nested model: y ¼ ρWy þ X β þ WX δ þ e e ¼ γWu þ u:

(19)

Cross-Section Spatial Regression Models

2127

All previous models and some others (Not all are presented here. See Halleck Vega and Elhorst (2015) for details.) can be obtained from this model with the appropriate constraints. The nested nature of all the basic spatial econometric models is at the origin of specification search strategies presented in section “Specification Tests in Spatial Cross-Sectional Models”.

2.6

Higher-Order Spatial Models

In these models, multiple spatially lagged dependent variables and/or multiple spatially lagged error terms are included. For instance, the spatial autoregressive, moving-average SARMA( p,q) process is as follows: y ¼ X β þ ρ1 W 1 y þ ρ2 W 2 y þ . . . þ ρp W p y þ e e ¼ λ1 W 1 u þ λ2 W 2 u þ . . . þ λp W p u þ u:

(20)

In general the weights Wi are associated to the ith order of contiguity. We could similarly consider a process where the errors follow a spatial autoregressive process of order q. However, in this case, identification issues may arise (Anselin 1988). It may be that these high-order processes are the result of a poorly specified spatial weight matrix rather than a realistic data generating process (Anselin and Bera 1998). For instance, if the spatial weight matrix of the model underestimates the true spatial interaction in the data, there will be residual spatial error autocorrelation. This can lead to the estimation of higher order-processes while only a well-specified weight matrix should be necessary. These higher-order models are in fact usually used as alternatives in diagnostic tests. Rejection of the null may then indicate that a different specification of the weights is necessary.

2.7

Heteroskedasticity

Until now, all specifications have assumed iid innovations. However, as we have seen, the sole presence of spatial autocorrelation induces heteroskedasticity in the models. In cross-sectional regression, additional heteroskedasticity is also frequently present. For instance, in the spatial autoregressive error model, we can have: y ¼ Xβ þ e e ¼ λW e þ u with u ! iid ð0,ΩÞ

(21)

where Ω is a diagonal matrix. In this case, the variance-covariance matrix of e is: 1

E ðee0 Þ ¼ ðI N  ρW Þ1 ΩðI N  ρW 0 Þ :

(22)

Several specifications have been used for Ω. In a spatial context, a useful one is that of groupwise heteroskedasticity. When the data are organized into regimes, one variance is estimated for each regime so that Ω has a bloc-diagonal structure:

2128

J. Le Gallo

2

σ 21 I N 1 Ω¼4 ⋮ 0

 ⋱ 

3 0 ⋮ 5 σ 2L I N L

(23)

where L is the number of regimes; Nl, l = 1, . . . , L is the number of observations in regime l and I N l , l ¼ 1, . . . , L is the identity matrix of dimension Nl. In a spatial context, the regimes may have a spatial structure, such as North vs. South, Center vs. Periphery, etc.   The variance can also be specified as a function of variables such as σ 2i ¼ σ 2 f z0i α where σ 2 is a scale parameter; f is some functional form; zi is a (P  1) vector of variables and αi, i = 1, . . . , P are unknown parameters to estimate. For instance, in a spatial context, Casetti and Can (1999) suggest the DARP (Drift Analysis of Regression Parameters) model: the variance of the error terms is expanded into a monotonic function of the observations’ distance from a reference point in an expansion space: σ 2i ¼ eγ0 þγ1 hi

(24)

where hi is the square of the distance between the ith observation and one reference point (such as the Central Business District in a city). The variance-covariance matrix can also be left unspecified as in the non-parametric approach. For instance, Kelejian and Prucha (2007) suggest a non-parametric heteroskedasticity and autocorrelation consistent (HAC) estimator of the variance-covariance matrix in a spatial context, i.e. a SHAC procedure. They assume that the (N  1) disturbance vectors e of model (1) are generated as follows: e = Rξ where R is a (N  N ) non-stochastic matrix whose elements are not known. This disturbance process allows for general patterns of correlation and heteroskedasticity. The asymptotic distribution of the corresponding OLS or instrumental variables (IV) estimators implies the variance-covariance matrix ψ = N1X0ΣX, where Σ = (σ ij) denotes the variance-covariance matrix of e and X is the matrix containing all the explanatory variables in the model (possibly including Wy, if the base model is a spatial lag model). Kelejian and Prucha (2007) then show that the SHAC estimator for its (r,s)th element is:

b rs ¼ N ψ

1

N X N X i¼1 j¼1

  d ij xir xjsbe ibe j K dn

(25)

where xir is the ith element of the rth explanatory variable; ebi is the ith element of the OLS or IV residual vector; dij is the distance between unit i and unit j; dn is the bandwidth and K(.) is the Kernel function with the usual properties. According to Kelejian (2016), such option should be privileged: error terms should be left unspecified and allow for the possibility of spatial autocorrelation and heteroskedasticity non-parametrically.

Cross-Section Spatial Regression Models

2.8

2129

Parameter Instability

Spatial heterogeneity can also produce parameter instability, i.e. the absence of constancy in some, or all, of the parameters in the regression model. For our purpose, this instability has a spatial dimension: the regression coefficients correspond to a number of distinct spatial regimes. The spatial variability of the coefficients can be discrete, if systematic differences between regimes are observed. In this case, model coefficients are allowed to vary between regimes. It can also be continuous over space. In the absence of spatial autocorrelation, the case of discrete spatial heterogeneity can be readily treated with standard tools such as dummy variables, ANOVA or spline functions. Recently, some authors have investigated the possibility of spatial heterogeneity affecting the spatial lag or spatial error coefficients. In this case, the methodology consists of estimating higher-order models where the spatial matrices pertain to different spatial regimes rather than different order of contiguities. Heterogeneity can also be continuous. In this case, rather than partitioning the cross-sectional sample into regimes, we assume that parameter heterogeneity is location-specific. One possibility is to use geographically weighted regression, labeled GWR (see Wheeler, chapter ▶ “Geographically Weighted Regression”), which is a locally linear, non-parametric estimation method. The base model for one location i is: yi ¼

K X

βki xki þ ei :

(26)

k¼1

A different set of parameters is estimated for each observation by using the values of the characteristics taken by neighboring observations. With respect to spatial autocorrelation, Pace and LeSage (2004) have pointed out that if spatial autocorrelation only arises due to inadequately modeled spatial heterogeneity, GWR can potentially eliminate the problem. However, this is not necessarily the case when substantive interactions coexist with parameter heterogeneity. Therefore, Pace and LeSage (2004) have generalized GWR to allow simultaneously for spatial parameter heterogeneity and spatial autocorrelation: the spatial autoregressive local estimation (SALE): U ðiÞy ¼ ρi U ðiÞWy þ U ðiÞX βi þ U ðiÞe

(27)

where U(i) represents a (N  N ) diagonal matrix containing distance-based weights for observation i that assigns the weights of one to the m nearest neighbors to observation i and weights of zero to all the other observations. The product U(i)y then represents an (m  1) sub-sample of observations on the explained variables associated with the m observations nearest in location to observation i. The other products are interpreted in a similar fashion. As m ! N, U(i) ! IN, the local estimates approach the global estimates from the SAR model as the sub-sample increases.

2130

3

J. Le Gallo

Specification Tests in Spatial Cross-Sectional Models

Ignoring spatial effects when they are present has various impacts on the properties of estimates. It may lead to biased and inconsistent estimates of the model parameters for an omitted spatial lag, or inefficient estimates and biased inference for omitted spatial error autocorrelation and/or omitted heteroskedasticity. Specification testing is therefore relevant in applied work and constitutes the topic of this section. We first present Moran’s I test, where the alternative is an unspecified form of spatial autocorrelation. Second, we detail the most commonly used tests of spatial autocorrelation based on maximum likelihood: tests of a single alternative, conditional tests and robust tests. Indeed, as featured in Pace (chapter on ▶ “Maximum Likelihood Estimation”) and Prucha (chapter on ▶ “Instrumental Variables/Method of Moments Estimation”), there might be some complexities involved in the estimation of spatial processes, based on non-linear optimization (ML or GMM). Consequently, tests based on the Lagrange multiplier (LM) principle (or score test) have been extensively used in specification testing. Contrary to Wald (W) or Likelihood Ratio (LR) tests, they only necessitate estimation of the model under the null hypothesis, typically the simple regression model as in Eq. (1). Third, some strategies aimed at finding the best specification have been devised, when the researcher does not have a priori knowledge of the form taken by spatial autocorrelation. Finally, we outline the complex interactions between spatial autocorrelation and spatial heterogeneity and discuss how spatial heterogeneity can be tested.

3.1

Moran’s I Test

Moran’s I test is a diffuse test as the alternative is not a specified form of spatial autocorrelation. It is the two-dimensional analog of the test of temporal correlation in univariate time series for regression residuals (Moran 1950). In matrix notation it is formally written as: I¼

  N e0 We S 0 e0 e

(28)

where e ¼ y  X β~ is the vector of OLS regression residuals; W is the spatial weight matrix and S0 is a standardization factor equal to the sum of all elements in W. For a row-standardized weight matrix W, this element simplifies to one. The first two moments under the null were derived by Cliff and Ord (1972): trðMW Þ N K n o trðMWM W 0 Þ þ trðMW Þ2 þ trðMW Þ2 E ðI Þ ¼

V ðI Þ ¼

ð N  K Þ ð N  K þ 2Þ

(29)

 ½ E ðI Þ 2

(30)

Cross-Section Spatial Regression Models

2131

where M is the usual symmetric and idempotent matrix: M = IN  X(X 0X)1X 0. Inference is then based on the standardized value: Z(I ) = [I  E(I)]/V(I). For normally distributed residuals, Z(I ) asymptotically follows a centered normal distribution. Under the null assumption of spatial independence, Moran’s I test is locally best invariant and is also asymptotically equivalent to a likelihood ratio of H0: λ = 0 in Eq. (9) or of H0: γ = 0 in Eq. (13), it therefore shares the asymptotic properties of these statistics. Moreover, Moran’s I has power against any alternative of spatial correlation, including a spatial lag alternative. In the remainder of the section, we consider tests with a specific alternative, i.e. focused tests and concentrate on Lagrange multiplier tests that only require estimation of the model under the null hypothesis. Some of these tests are unidirectional when the alternative deals with one specific misspecification; others are multidirectional when the alternative comprises various misspecifications.

3.2

Tests of a Single Assumption

3.2.1 Spatial Error Autocorrelation First consider the case where the error terms follow a spatial autoregressive model (9) with e = λWe + u. We test H0: λ = 0. The null corresponds to the linear classical model (1). The Lagrange multiplier statistic can be written the following way (Anselin 1988): 

LM ERR

 0  2 e0 We= eNe ¼ T

(31)

where T = tr[(W0 + W)W], tr is the trace operator and e is the vector of OLS regression residuals. This is equivalent to a scaled Moran coefficient. Since there is only one constraint, under the null this statistic is asymptotically distributed as a χ 2(1). The test statistic is the same if we specify the moving average process given by Eq. (13) as the alternative with the test H0: γ = 0. LMERR is therefore locally optimal for the two alternatives (autoregressive and moving average). Consequently, when the null is rejected, the test does not provide any indications with respect to the form of the error process. Pace and LeSage (2008) argue that the test of spatial error autocorrelation can be performed using a Hausman test, since under the null (model 1), there are two consistent estimators differing in efficiency (OLS and ML) and under the alternative (model 2) only one estimator is efficient (ML).

3.2.2 Kelejian-Robinson Specification For the specification of the error suggested by Kelejian and Robinson (1995), a Lagrange multiplier test can also be derived following the same principle. Using notation from model (15) testing the null H0: φ = 0 yields a statistic of the form (Anselin 2001):

2132

J. Le Gallo

 KR ¼

e0 W 0 We  T1 e0 e=N

2  T2 2 T2  1 N

(32)

where T1 = tr(WW2) and T12 = tr(WW 0WW 0). Under the null, this statistic is asymptotically distributed as a χ 2(1).

3.2.3 Common Factor Test The common factor test allows choosing between a model with spatial error autocorrelation and a spatial Durbin model. The unconstrained spatial Durbin model in Eq. (18) and the spatial error model in Eq. (9) are equivalent if H0: λβ  δ = 0. This test can be performed with the Lagrange multiplier principle. The corresponding statistic is asymptotically distributed as a χ 2(K  1). 3.2.4 Test of an Endogenous Spatial Lag In this case, the null hypothesis is H0: ρ = 0 in Eq. (2). The test statistic is (Anselin 1988): 

LM LAG

WX e β

0

M WX e β



 0  2 e0 Wy= eNe ¼ D

(33)

with D ¼ þ trðW 0 W þ WW Þ where e β and e σ 2 are the OLS estimates. σ~2 2 This statistic is asymptotically distributed as a χ (1).

3.3

Tests in Presence of Spatial Autocorrelation or Spatial Lag

In specification testing, it is useful to know if the model contains both spatial error autocorrelation and an endogenous spatial lag. In this respect, Anselin et al. (1996) note that LMERR is the test statistic corresponding to H0: λ = 0 when assuming a correct specification for the rest of the model, that is ρ = 0. However, if ρ 6¼ 0, this test is no longer valid, even asymptotically since it is not distributed as a centered χ 2. Hence, valid statistical inference necessitates taking account of a possible endogenous variable when testing spatial error autocorrelation and viceversa. Facing this problem, three strategies are possible. First, one can perform a joint test for the presence of an endogenous spatial lag and spatial error autocorrelation. However, if the null is rejected, the exact nature of spatial dependence is not known. Second, another solution consists of estimating a model with an endogenous spatial lag and then testing for residual spatial autocorrelation and vice-versa. Third, Anselin et al. (1996) suggest robust tests based on OLS residuals in the simple model that are capable of taking account spatial error autocorrelation when testing endogenous spatial lag and vice-versa.

Cross-Section Spatial Regression Models

2133

3.3.1 Joint Test The first approach is a test of the joint null hypothesis H0: λ = ρ = 0 in a model containing both a spatial lag and a spatial error: y ¼ ρW 1 y þ X β þ e e ¼ λW 2 e þ u:

(34)

The Lagrange multiplier test is based on the OLS residuals. The test statistic is (Anselin 1988):

SARMA ¼

h  2 i  2 d~λ D þ d~ρ T 22  2d~λ d~ρ T 12 DT 22  T 212

 2 d~λ  d~ρ d~λ if W 1 ¼ W 2 SARMA ¼ þ T DT

(35a)

(35b)

where d~λ ¼ ðe0 WeÞ=ðe0 e=N Þ, d~ρ ¼ ðe0 WyÞ=ðe0 e=N Þ, Tij = tr[WiWj + Wj0Wj], D ¼  0   W 1 X β~ M W 1 X β~ þ T 11 σ~2 and T = tr[(W 0 + W )]. Under the null, SARMA is asymptotically distributed as a χ 2(2). If the null is rejected, the exact nature of spatial dependence is not known. Extensions of these principles to joint tests in SARMA ( p, q) models are derived in Anselin (2001).

3.3.2 Conditional Tests This approach consists of performing a Lagrange multiplier test for a form of spatial dependence when the other form is not constrained. For instance, we test H0: λ = 0 in presence of ρ. The null corresponds to the spatial lag model whereas the alternative corresponds to Eq. (31). The test is then based on the residuals of model (2) estimated by maximum likelihood. The test statistic is as follows (Anselin 1988): LM ERR

¼

2 dbρ

  ρ T 22  ðT 21A Þ2 Vb b

(36)

where T21A = tr[W2W1 A1 + W20 W1 A1], A ¼ I N  b ρW 1 , b ρ is the maximum likelihood estimator of ρ and Vb ðb ρ Þ is the estimated variance of b ρ in model (2). Under the null, this statistic is asymptotically distributed as a χ 2(2). Conversely, we can also test H0: ρ = 0 in presence of λ, the test is then based on the maximum likelihood ^e in the spatial error model (9). The statistic is (Anselin 1988):

2134

J. Le Gallo

LM LAG ¼

2 ðbe B0 BW 1 yÞ H ρ  H θρ Vb θb H 0

(37)

θρ

where θ = (β0, λ, σ 2); θb is the maximum likelihood estimator of θ, B ¼ I N  b λW 1 , b is the estimated variance-covariance matrix of θb in model (9). The other Vb ðθÞ terms are: 0 BW 1 X b β BW 1 X b β  H ρ ¼ tr W 1 þ tr BW 1 B1 þ b σ2 3 2 ðBX Þ0 BW 1 X b β 7 6 7 6 b σ2 7 6     H θρ ¼ tr6 7: 6 tr W 2 B1 BW 1 B1 þ tr W 2 W 1 B1 7 5 4 0 

 2



(38)

(39)

Under the null, this statistic is asymptotically distributed as a χ 2(1).

3.3.3 Robust Tests The third approach, suggested by Anselin et al. (1996) consists of using tests robust to a local misspecification. For instance, LMERR is adjusted so that its asymptotic distribution remains a centered χ 2(1), even in local presence of ρ. This test can be done using the OLS residuals of the simple model given by Eq. (1). Assuming for simplicity W1 = W2, the modified statistic for the test H0: λ = 0 is: 

RLM ERR

d~λ  TD1 d~ρ ¼ ½T ð1  TDÞ

2 :

(40)

Similarly, the test statistic of H0: ρ = 0 in local presence of λ is: 

RLM LAG

3.4

d~λ  d~ρ ¼ DT

2 :

(41)

Specification Search Strategies

Tests based on Lagrange multiplier have been very popular in applied spatial econometrics in specification search, as they only require estimation of the model under the null, typically, the simple model estimated by OLS. They can be combined to develop a specific-to-general sequential specification search strategy, i.e. a forward stepwise specification search, whenever no a priori spatial specification has been chosen.

Cross-Section Spatial Regression Models

2135

The first step consists in estimating the simple model given by Eq. (1) by means of OLS and in performing Moran’s I test and the SARMA test. The rejection of the null in both cases indicates omitted spatial autocorrelation but not the form taken by this autocorrelation. If the null hypothesis is rejected, it may be a sign of model misspecification. For instance, using a Monte-Carlo experiment, McMillen (2003) shows that incorrect functional forms or omitted variables that are correlated over space might produce spurious spatial autocorrelation. It may therefore be useful to include in the model, if possible, additional variables. It can be exogenous additional variables that may eliminate or reduce spatial dependence, or exogenous spatial lags, corresponding in total or in part to the initial explanatory variables. If the addition of exogenous variables has not eliminated spatial autocorrelation, a model incorporating a spatial lag and/or a spatial error must be estimated. The choice between these two forms of spatial dependence can be made by comparing the significance levels of LMERR (see Eq. (31)) and LMLAG (see Eq. (33)) and their robust versions RLMERR (see Eq. (40)) and RLMLAG (see Eq. (41)): if LMLAG (resp. LMERR) is more significant than LMERR (resp. LMLAG) and RLMLAG (resp. RLMERR) is significant but not RLMERR (resp. RLMLAG), a spatial lag (resp. a spatial error) must be included in the regression model (Anselin and Florax 1995). Once the spatial lag or the spatial error model has been estimated, three additional tests can be implemented. On the one hand, for a spatial lag model, LM ERR allows checking whether an additional spatial error is still necessary. On the other hand, for a spatial error model, LM LAG allows checking whether an additional spatial lag is still necessary. The common factor test allows checking whether the restriction H0: λβ  δ = 0 is rejected or not. If not, Eq. (18) reduces to the spatial error model (9). There are several drawbacks with this classical specific-to-general approach. First, the significance levels of the sequence of tests are unknown. Second, every test is conditional on arbitrary assumptions that may be tested later. The inference is then invalid if these assumptions are indeed rejected. As a consequence, the results of this approach is subject to the order in which the tests are carried out and whether or not adjustments are made in the significance levels of the sequence of tests. Alternatively, a general-to-specific search strategy, i.e. a forward stepwise specification search, can be implemented based on the spatial Durbin model (18) or the general model in Eq. (19). Starting for instance with model (18), testing is performed using Wald statistics or likelihood ratio statistics. Then, failure to reject the common factor constraints suggests a spatial error model while rejection of these constraints suggests a spatial lag model. In the first case, the significance of the spatial error coefficient is tested; if it is significant the final specification is the error model (9), if it is not, the final model is the simple model given by Eq. (1). Likewise, in the second case, the significance of the spatial lag coefficient is tested; if it is not significant, the final model selection is the standard regression model. Simulation experiments performed by Florax et al. (2003) compare the specificto-general and the general-to-specific strategies and provide some evidence of better performances of the forward strategy, in terms of power and accuracy. Note however that these data-driven specification strategies have lost ground in empirical spatial

2136

J. Le Gallo

econometric papers as the current practice tends to start from sound theoretical foundations that imply, for instance, that the spatial lag or the spatial Durbin models are reduced forms from a theoretical modeling, such as peer effects, fiscal interactions, etc. In this case, there is no need to perform such specification search strategies. Nonetheless, some of the LM tests may come in handy in order to check the validity of the starting specification from a statistical point of view, by checking for the presence of other forms of spatial autocorrelation. Moreover, because Moran’s I test and the LM test for spatial autocorrelation may have power against various alternatives, the first step of the strategy, i.e. checking for functional form or thinking of omitted variables generating spurious spatial autocorrelation, remains of utmost importance in applied work.

3.5

Non-Nested Tests

The basis of the specification search strategies above is that the competing models are nested within a more general model (Spatial Durbin model). However, for non-nested alternatives, other strategies must be devised. For instance, Kelejian and Piras (2011) have extended the J-test procedure to a spatial framework. The null hypothesis corresponds to a spatial error, spatial lag model as in Eq. (34) with similar weights while the alternative hypothesis corresponds to a set of G models that differ from the model in H0 with respect to specification of the regressor matrix, the weighting matrix, the disturbance term, or a combination of these three.

3.6

Spatial Autocorrelation and Spatial Heterogeneity

Spatial autocorrelation and spatial heterogeneity are often both present in regressions. We have already underlined that heteroskedasticity is implied by the presence of a spatial lag or a spatial error term. More generally, these two effects entertain complex links. First, there may be observational equivalence between these two effects in a cross-section (Anselin and Bera 1998). Secondly, heteroskedasticity and structural instability tests are not reliable in the presence of spatial autocorrelation. Conversely, spatial autocorrelation tests are affected by heteroskedasticity. Thirdly, spatial autocorrelation is sometimes the result of unmodeled parameter instability. In other words, if space-varying relationships are modeled within a global regression, the error terms may be spatially autocorrelated. All these elements suggest that both aspects cannot be considered separately. We briefly review here some tests that have tackled this issue.

3.6.1 Spatial Autocorrelation and Heteroskedasticity First, a joint test of spatial error autocorrelation and heteroskedasticity consists in the sum of a Breusch-Pagan test and the LMERR (Anselin 1988). The resulting statistic is asymptotically distributed as a χ 2(P), where P is the number of variables that affect the variance (Eq. (23)). Alternatively, Kelejian and Robinson

Cross-Section Spatial Regression Models

2137

(1998) derive a joint test for spatial autocorrelation and heteroskedasticity that does not require the normality assumption for the error terms and the regression model to be linear. Conditional tests may also be performed. On the one hand, a Lagrange multiplier test of spatial error autocorrelation in a regression with heteroskedastic error terms b be the estimated diagonal variance-covariance matrix, then may be derived. Let Ω the heteroskedastic LM statistics becomes (Anselin 1988): 1

b WeÞ2 ðe0 Ω

LM ¼

1

b W ΩÞ b 0 trðWW þ W 0 Ω

(42)

where e is the vector of residuals in the heteroskedastic regression. This statistic is asymptotically distributed as a χ 2(1). On the other hand, a test of heteroskedasticity in a spatial lag model or a spatial error model can be performed. In the first case, a Breusch-Pagan statistic is computed on the ML residuals while in the second case, it is performed on spatially filtered residuals in the ML estimation.

3.6.2 Spatial Autocorrelation and Parameter Instability In the case of discrete parameter heterogeneity under the form of spatial regimes in a homoskedastic model, a test of equality of some or all parameters between regimes can be performed using a standard Chow test. However, when error spatial autocorrelation and/or heteroskedasticity is present, this must be adjusted. Formally, without loss of generality, consider a model with two regimes: 

y1 y2



 ¼

X1 0

0 X2



 β1 e þ 1 : β2 e2

(43)

  Let e ¼ e01 e02 and the variance-covariance matrix: Ψ = E(ee0). The test of parameter stability is H0: β1 = β2. When Ψ = σ 2Ω, then the test statistic is (Anselin 1988): 1

CG ¼

1

b b b b b e 0L Ω e 0C Ω eC  b eL 2 b σ

(44)

e L the where ebC is the vector of estimated residuals of the constrained model and b vector of estimated residuals of the unconstrained residuals. This statistic is asymptotically distributed as a χ 2(K ), where K is the number of explanatory variables in the model. When the break affects the spatial coefficient, Mur et al. (2010) also suggest LM tests. For instance, assume a spatial lag model where a simple break (such a center vs. periphery) only affects the parameter of spatial dependence:

2138

J. Le Gallo

  y ¼ ρ0 Wy þ ρ1 W  y þ X β þ e with e ! iid 0, σ 2 I N

(45)

where ρ0 is the spatial lag coefficient pertaining to the second regime, ρ1 represents the difference between the first regime and the second regime and W is a weight matrix defined as: wij ¼ wij if location i or location j belongs to the first regime and wij ¼ 0 otherwise. Then the LM statistic for the test H0: ρ1 = 0 is: h

y0 W  e~ σ~2 

LM BREAK ¼ LAG

1

trA~ W 

i2 (46)

b σ2

where e~ is the vector of residuals from ML estimation of Eq. (2); σ~2 is the ~ where ρ~ is the ML estimate in corresponding estimated variance; A~ ¼ I N  ρW Eq. (2); b σ 2 is the ML estimated variance corresponding to the linear restriction of the null. This statistic is asymptotically distributed as a χ 2(1). A spatial error model with a structural break affecting the spatial error parameter would be: y ¼ Xβ þ e e ¼ λ0 W e þ λ1 W  e þ u with u ! iid ð0, σ 2 I N Þ:

(47)

The LM statistic for the test H0: λ1 = 0 is as follows: h LM BREAK ¼ ERR

e~0 W  B~ e~  σ~2

1

trB~ W 

σb2

i2 (48)

where e is the vector of residuals of the ML estimation of Eq. (9), σ~2 is the ~ where λ~ is the ML estimate in corresponding estimated variance, B~ ¼ I N  λW 2 Eq. (9), b σ is the ML estimated variance corresponding to the linear restriction of the null. This statistic is asymptotically distributed as a χ 2(1).

4

Conclusion

The objective of this chapter was to provide a selective review of specification issues in spatial econometrics. We focused on the way spatial effects may be incorporated into regression models and on specification testing. Beginning with the most commonly used spatial specifications in a cross-sectional setting taking the form of linear regression models including a spatial lag and/or a spatial error term, heteroskedasticity or parameter instability we presented a set of specification tests that allow checking deviations from a standard, i.e. non-spatial, regression model. A great deal of space has been devoted to LM tests as they only require estimation of the model under the null. Unidirectional, multidirectional and robust LM tests are now in the standard toolbox of spatial econometrics. They are still frequently used in applied work, despite the fact that technical/numerical difficulties associated to the

Cross-Section Spatial Regression Models

2139

estimation of spatial models for very large samples have become much more tractable. Finally, because of the complex links between spatial autocorrelation and spatial heterogeneity, we have devoted space to the specifications incorporating both aspects and to the associated specification tests.

References Anselin L (1988) Spatial econometrics, methods and models. Kluwer, Dordrecht Anselin L (2001) Rao’s score test in spatial econometrics. J Statist Plann Inference 97:113–139 Anselin L (2010) Thirty years of spatial econometrics. Pap Reg Sci 89:3–25 Anselin L, Bera AK (1998) Spatial dependence in linear regression models with an application to spatial econometrics. In: Ullah A, Giles DEA (eds) Handbook of applied economics statistics. Springer, Berlin Anselin L, Florax RGJM (1995) Small sample properties of tests for spatial dependence in regression models: some further results. In: Anselin L, Florax RJGM (eds) New directions in spatial econometrics. Springer, Berlin Anselin L, Bera A, Florax RGJM, Yoon M (1996) Simple diagnostic test for spatial dependence. Reg Sci Urban Econ 26:77–104 Arbia G (2011) A lustrum of SEA: recent research trends flowing the creation of the spatial econometrics association (2007–2011). Spat Econ Anal 6:377–395 Casetti E, Can A (1999) The econometric estimation and testing of DARP models. J Geogr Syst 1:91–106 Cliff A, Ord JK (1972) Testing for spatial autocorrelation among regression residuals. Geogr Anal 4:267–284 Conley TG (1999) GMM estimation with cross-sectional dependence. J Econ 92:1–44 Elhorst JP (2010) Applied spatial econometrics: raising the bar. Spat Econ Anal 5:9–28 Florax RJGM, Folmer H, Rey SJ (2003) Specification searches in spatial econometrics: the relevance of Hendry’s methodology. Reg Sci Urban Econ 33:557–579 Fotheringham AS, Brundson C, Charlton M (2004) Geographically weighted regression: the analysis of spatially varying relationships. Wiley, Chichester Halleck Vega S, Elhorst JP (2015) The SLX model. J Reg Sci 55:339–363 Kelejian HH (2016) Critical issues in spatial models: error term specifications, additional endogenous variables, pre-testing, and Bayesian analysis. Lett Spat Resour Sci 9:113–136 Kelejian HH, Piras G (2011) An extension of Kelejian’s J-test for non-nested spatial models. Reg Sci Urban Econ 41:281–292 Kelejian HH, Piras G (2017) Spatial econometrics. Elsevier, Amsterdam Kelejian HH, Prucha I (2007) HAC estimation in a spatial framework. J Econ 140:131–154 Kelejian HH, Robinson DP (1995) Spatial correlation: a suggested alternative to the autoregressive model. In: Anselin L, Florax RJGM (eds) Advances in spatial econometrics. Springer, Heidelberg Kelejian HH, Robinson DP (1998) A suggested test for spatial autocorrelation and/or heteroskedasticity and corresponding Monte-Carlo results. Reg Sci Urban Econ 28:389–417 LeSage J, Pace KP (2009) Introduction to spatial econometrics. CRC Press, Boca Raton McMillen DP (2003) Spatial autocorrelation or model misspecification? Int Reg Sci Rev 26: 208–217 Moran P (1950) A test for the serial dependence of the residuals. Biometrika 35:255–260 Mur J, Lopez F, Angulo A (2010) Instability in spatial error models: an application to the hypothesis of convergence in the European case. J Geogr Syst 12:259–280 Pace RK, LeSage JP (2004) Spatial autoregressive local estimation. In: Getis A, Mur J, Zoller H (eds) Recent advances in spatial econometrics. Palgrave Macmillan, Basingstoke Pace RK, LeSage JP (2008) A spatial Hausman test. Econ Lett 101:282–284

Spatial Panel Models and Common Factors J. Paul Elhorst

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Linear Spatial Dependence Models for Cross-Section Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Linear Spatial Dependence Models for Panel Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Dynamic Linear Spatial Dependence Models for Panel Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Direct, Indirect, and Spatial Spillover Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Empirical Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Cross-References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2142 2143 2145 2147 2150 2151 2158 2158 2159

Abstract

This chapter provides a survey of the existing literature on spatial panel data models. Both static, dynamic, and dynamic models with common factors will be considered. Common factors are modeled by time-period fixed effects, crosssectional averages, or principal components. It is demonstrated that spatial econometric models that include lags of the dependent variable and of the independent variables in both space and time provide a useful tool to quantify the magnitude of direct and indirect effects, both in the short term and long term. Direct effects can be used to test the hypothesis as to whether a particular variable has a significant effect on the own dependent variable, and indirect effects to test the hypothesis whether spatial spillovers affect the dependent variable of other units. To illustrate these models, their effects estimates, and the impact of the type of common factors, a demand model for cigarettes is estimated based on panel data from 46 U.S. states over the period 1963 to 1992.

J. P. Elhorst (*) Department of Economics, Econometrics and Finance, University of Groningen, Groningen, The Netherlands e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_86

2141

2142

J. P. Elhorst

Keywords

Spatial panels · Dynamic effects · Spatial spillovers · Common factors · Estimation

1

Introduction

Spatial econometrics deals with interaction effects among geographical units, such as zip codes, neighborhoods, municipalities, counties, regions, states, or countries. Examples are economic growth rates of OECD countries over T years, monthly unemployment rates of EU regions in the last decade, and annual tax rate changes of all jurisdictions in a country since the last election. Spatial econometric models are also used to explain the behavior of economic agents other than geographical units, such as individuals, firms, or governments. The best example is a hedonic price equation in which the price of each house is explained by the price and characteristics of other houses that have sold prior and in the neighborhood of that house (Kelejian and Piras 2017, p. 13). In modeling terms, three different types of interaction effects can be distinguished: endogenous interaction effects among the dependent variable (Y ), exogenous interaction effects among the independent variables (X), and interaction effects among the error terms (e). Originally, the central focus of spatial econometrics has been on one type of interaction effect in a single equation cross-section setting (chapter ▶ “Cross-Section Spatial Regression Models”). Usually, the point estimate of the coefficient of this interaction effect was used to test the hypothesis as to whether spatial spillover effects exist. Most of the work was inspired by research questions arising in regional science and economic geography, where the units of observations are geographically determined and the structure of the dependence among these units can somehow be related to location and distance. However, more recently, the focus has shifted to models with more than one type of interaction effects, to panel data, to dynamic specifications, and to the marginal effects of the explanatory variables in the model rather than the point estimates of the interaction effects. In this chapter, we review and organize this recent literature. In Sect. 2, we present the linear regression model with spatial interaction effects for cross-section data and, in Sect. 3, its extension to panel data. In Sect. 4, the latter model is further extended to include dynamic effects in both space and time, as well as common factors. In Sect. 5, we provide so-called “effects estimates” (chapter ▶ “Interpreting Spatial Econometric Models”), which are required for making correct inferences regarding the effect of independent variables on the dependent variable. In Sect. 6, we estimate a demand model for cigarettes based on panel data from 46 U.S. states over the period 1963 to 1992 to empirically illustrate the different models. This data set is taken from Baltagi (2005) and has been used for illustration purposes in other studies too. Finally, we conclude this chapter with a number of important implications for econometric modeling of relationships based on spatial panel data.

Spatial Panel Models and Common Factors

2

2143

Linear Spatial Dependence Models for Cross-Section Data

The standard approach in most empirical work is to start with a non-spatial linear regression model and then to test whether or not the model needs to be extended with spatial interaction effects. This approach is known as the specific-to-general approach. The non-spatial linear regression model takes the form Y ¼ α ιN þ X β þ e

(1)

where Y denotes an N  1 vector consisting of one observation on the dependent variable for every unit in the sample (i = 1, . . . , N ), ιN is an N  1 vector of ones associated with the constant term parameter α, X denotes an N  K matrix of exogenous explanatory variables, with the associated parameters β contained in a K  1 vector, and e = (e1, . . . , eN)T is a vector of disturbance terms, where ei are independently and identically distributed error terms for all i with zero mean and variance σ 2. Since the linear regression model is commonly estimated by Ordinary Least Squares (OLS), it is often labeled the OLS model. Furthermore, even though the OLS model in most studies focusing on spatial interaction effects is rejected in favor of a more general model, its results often serve as a benchmark. The opposite approach is to start with a more general model that nests a series of simpler models, which would ideally represent all alternative economic hypotheses requiring consideration. Generally, three different types of interaction effects may explain why an observation associated with a specific location may be dependent on observations at other locations: i. Endogenous interaction effects, where the decision of a particular unit A (or its economic decision makers) to behave in some way depends on the decision taken by other units, among which, say, unit B, Dependent variable y of unit A $ Dependent variable y of unit B:

(2)

Endogenous interaction effects are typically considered as the formal specification for the equilibrium outcome of a spatial or social interaction process, in which the value of the dependent variable for one agent is jointly determined with that of the neighboring agents. In the empirical literature on strategic interaction among local governments, for example, endogenous interaction effects are theoretically consistent with the situation where taxation and expenditures on public services interact with taxation and expenditures on public services in nearby jurisdictions. ii. Exogenous interaction effects, where the decision of a particular unit to behave in some way depends on independent explanatory variables of the decision taken by other units. Independent variable x of unit B ! Dependent variable y of unit A:

(3)

2144

J. P. Elhorst

Consider, for example, the savings rate. According to standard economic theory, saving and investment are always equal. People cannot save without investing their money somewhere, and they cannot invest without using somebody’s savings. This is true for the world as a whole, but it is not true for individual economies. Capital can flow across borders; hence the amount an individual economy saves does not have to be the same as the amount it invests. In other words, per capita income in one economy also depends on the savings rates of neighboring economies. It should be stressed that, if the number of independent explanatory variables in a linear regression model is K, the number of exogenous interaction effects might also be K, provided that the intercept is considered as a separate variable. In other words, not only the savings rate but also other explanatory variables may affect per capita income in neighboring economies. It is for this reason that economic growth in both the theoretical and the empirical literature on economic growth and convergence among countries or regions is not only taken to depend on the initial income level and the rates of saving, population growth, technological change and depreciation in the own economy, but also on those of neighboring economies (Ertur and Koch 2007). iii. Interaction effects among the error terms. Error term u of unit A $ Error term u of unit B:

(4)

Interaction effects among the error terms do not require a theoretical model for a spatial or social interaction process, but instead, are consistent with a situation where determinants of the dependent variable omitted from the model are spatially autocorrelated, and with a situation where unobserved shocks follow a spatial pattern. Interaction effects among the error terms may also be interpreted to reflect a mechanism to correct rent-seeking politicians for unanticipated fiscal policy changes (Allers and Elhorst 2005). A full model with all types of interaction effects, known as the general nesting spatial (GNS) model, takes the form Y ¼ ρWY þ αιN þ X β þ WX θ þ u

(5)

u ¼ λWu þ e where the variable WY denotes the endogenous interaction effects among the dependent variables, WX the exogenous interaction effects among the independent variables, and Wu the interaction effects among the disturbance terms of the different units. ρ is called the spatial autoregressive coefficient, λ the spatial autocorrelation coefficient, while θ, just as β, represents a K  1 vector of fixed but unknown parameters. W is a nonnegative N  N matrix of known constants describing the arrangement of the units in the sample. Its diagonal elements are set to zero by assumption, since no unit can be viewed as its own neighbor. Three methods have been developed to estimate models that include interaction effects. One is based on maximum likelihood (ML) or quasi maximum likelihood

Spatial Panel Models and Common Factors

2145

Table 1 Spatial econometric models with different combinations of spatial interaction effects and their flexibility regarding spatial spillovers Type of model SAR, Spatial autoregressive modela SEM, Spatial error model SLX, Spatial lag of X model SAC, Spatial autoregressive combined modelb SDM, Spatial Durbin model SDEM, Spatial Durbin error model GNS, General nesting spatial model a

Interaction(s) WY Wu WX WY, Wu WY, WX WX, Wu WY, WX, Wu

Flexibility spatial spillovers Constant ratios Zero by construction Fully flexible Constant ratios Fully flexible Fully flexible Fully flexible

Also known as the spatial lag model Also known as the SARAR model

b

(QML), one on instrumental variables or generalized method of moments (IV/GMM), and one on the Bayesian Markov Chain Monte Carlo (MCMC) approach. QML and IV/GMM estimators are different in that they do not rely on the assumption of normality of the disturbances. Detailed descriptions of these estimation techniques can be found in Anselin (1988), LeSage and Pace (2009), Kelejian and Piras (2017), chapters ▶ “Maximum Likelihood Estimation”, ▶ “Bayesian MCMC Estimation”, and ▶ “Instrumental Variables/Method of Moments Estimation”. Technically, there are no obstacles to estimating the GNS model with interaction effects among the dependent variable, the independent variables and the disturbance terms. Often, however, the parameters cannot be interpreted in a meaningful way since the different types of interaction effects cannot be distinguished from each other due to overparameterization (Burridge et al. 2017). This is one of the reasons why the spatial econometrics literature distinguishes six simpler types of spatial econometric models. The differences between these models depend on the type and number of spatial interaction effects that are included. The first two columns of Table 1 provide an overview, including their designations and abbreviations. The last column will be further discussed in Sect. 5.

3

Linear Spatial Dependence Models for Panel Data

In recent years, the spatial econometrics literature has exhibited a growing interest in the specification and estimation of econometric relationships based on spatial panels. This interest can be partly explained by the increased availability of more data sets in which a number of spatial units are followed over time, and partly by the fact that panel data offer researchers extended modeling possibilities as compared to the single equation cross-sectional setting. Panel data are generally more informative, and they contain more variation and less collinearity among the variables. The use of panel data results in a greater availability of degrees of freedom, and hence increases efficiency in the estimation. Panel data also allow for the specification of more

2146

J. P. Elhorst

complicated behavioral hypotheses, including effects that cannot be addressed using pure cross-sectional data (see Baltagi 2005 and the references therein). The extension of the spatial econometric model, presented in Eq. (5), for a crosssection of N observations to a space-time model for a panel of N observations over T time periods is obtained by adding a subscript t, which runs from 1 to T, to the variables and the error terms of that model Y t ¼ ρW Y t þ αιN þ X t β þ W X t θ þ ut

(6)

ut ¼ λW ut þ et : This model can be estimated along the same lines as the cross-sectional model, provided that all notations are adjusted from one cross-section to T cross-sections of N observations. However, the main objection to pooling the data like this is that the resulting model does not account for spatial and temporal heterogeneity. Spatial units are likely to differ in their background variables, which are usually space-specific timeinvariant variables that do affect the dependent variable, but which are difficult to measure or hard to obtain. Examples of such variables abound: one spatial unit is located at the seaside, the other just at the border; one spatial unit is a rural area located in the periphery of a country, the other an urban area located in the center; norms and values regarding labor, crime and religion in one spatial unit might differ substantially from those in another unit, etc. Failing to account for these variables increases the risk of obtaining biased estimation results. One remedy is to introduce a variable intercept μi representing the effect of the omitted variables that are peculiar to each spatial unit considered. In sum, spatial specific effects control for all timeinvariant variables whose omission could bias the estimates in a typical crosssectional study. Similarly, the justification for adding time-period specific effects is that they control for all spatial-invariant variables whose omission could bias the estimates in a typical time-series study (Baltagi 2005). Examples of such variables also exist: one year is marked by economic recession, the other by a boom; changes in legislation or government policy can significantly affect the functioning of an economy as from the date of implementation, as a result of which data points observed prior or after that date might be significantly different from one another. The space-time model in Eq. (6) extended with spatial specific and time-period specific effects reads as Y t ¼ ρW Y t þ αιN þ X t β þ W X t θ þ μ þ ξt ιN þ ut

(7)

ut ¼ λW ut þ et where μ = (μ1, . . . , μN)T. The spatial and time-period specific effects may be treated as fixed effects or as random effects. In the fixed effects model, a dummy variable is introduced for each spatial unit and for each time period (except one to avoid perfect multicollinearity), while in the random effects model, μi and ξt are treated as random

Spatial Panel Models and Common Factors

2147

variables that are independently and identically distributed with zero mean and variance σ 2μ and σ 2ξ, respectively. Furthermore, it is assumed that the random variables μi, ξt, and et are independent of each other. The estimation of static spatial panel data models is extensively discussed in Elhorst (2014), and Lee and Yu (2015). Elhorst (2014) presents the ML estimator of the spatial lag model and of the spatial error model extended to include fixed effect or random effects. Halleck Vega and Elhorst (2015) further note that the spatial Durbin model can be estimated as a spatial lag model and the spatial Durbin error model as a spatial error model with explanatory variables [X WX] instead of X. The response parameters of the fixed effects models can be estimated by concentrating out the fixed effects first, called demeaning (see Baltagi 2005; Elhorst 2014 for mathematical details). The resulting equation can then be estimated by the ML estimation procedure developed by Anselin (1988) for the spatial lag model, provided that this procedure is generalized from one single cross-section of N observations to T crosssections of N observations. The estimation of the random effects model is somewhat more complicated (see Elhorst 2014 for details). Lee and Yu (2015) show that the ML estimator of the spatial lag and of the spatial error model with spatial and time-period fixed effects, as set out in Elhorst (2014), will yield inconsistent parameter estimates if both N and T are large. To correct for this, they propose a simple bias correction procedure based on the parameter estimates of the uncorrected approach. If time-period fixed effects are not included or if time period fixed effects are included and T is fixed, only the variance of the error term, σ 2, needs to be bias-corrected. Elhorst (2014) provides Matlab routines at his website www.spatial-panels.com for both the fixed effects and the random effects spatial lag model, as well as the fixed effects and the random effects spatial error model, accounting for this bias correction procedure.

4

Dynamic Linear Spatial Dependence Models for Panel Data

To make the spatial panel data model, presented in Eq. (7), dynamic, one might add time lags of the variables Yt and WYt, to get Y t ¼ τY t1 þ ρW Y t þ ηW Y t1 þ αιN þ X t β þ W X t θ þ μ þ ξt ιN þ ut :

(8)

This model is known as the dynamic spatial Durbin model (Debarsy et al. 2012). In addition, one might consider time lags of the variables Xt and WXt, and of the error terms ut and Wut. According to Anselin et al. (2008), the parameters of the model in Eq. (8) and of models with these extensions are not identified, but Lee and Yu (2016) provide an econometric-theoretical proof showing the opposite. Three methods have been developed in the literature to estimate models that have mixed dynamics in both space and time. One method is to bias-correct the maximum likelihood (ML) or quasi-maximum likelihood (QML) estimator, one method is based on instrumental variables or generalized method of moments (IV/GMM), and one method utilizes the Bayesian Markov Chain Monte Carlo (MCMC)

2148

J. P. Elhorst

approach. Detailed descriptions of these estimation techniques can be found in Lee and Yu (2015), and Parent and LeSage (2010). In addition, Elhorst (2010) and Parent and LeSage (2010) treat the first period cross-section as endogenous, and find that when applying respectively the ML and the Bayesian MCMC estimators, the correct treatment of the initial observations (endogenous instead of exogenous) is important, especially in cases when T is small. One objection to the model in Eq. (8) is that each time dummy ξt has the same homogenous impact on all observations in period t, while it is likely that, for example, business cycle effects may hit one unit harder than others. Similarly, each spatial fixed effect μi has the same homogenous impact on all observations in unit i. An alternative is to replace the spatial and time-period fixed effects by common factors that have a different heterogenous impact on the observations Y t ¼ τY t1 þ ρW Y t þ ηW Y t1 þ αιN þ X t β þ W X t θ þ

X r

Γ Tr f rt þ ut

(9)

where frt denotes the rth N  1 common factor observed at time t and Γ r is the corresponding N  1 vector of coefficient estimates. If two factors are considered, f1t = (1, . . . , 1)T and f2t = (ξ1, . . . , ξT)T, and the parameter restrictions Γ T1 ¼ ðμ1 , . . . , μN Þ and Γ T2 ¼ ð1, . . . ,1Þ are imposed, the model in Eq. (8) is obtained. This implies that controls for spatial and time-period fixed effects can be considered as two common factors. Formally, the spatial fixed effects represent one common factor ( f1t) which is constant over time but with heterogenous coefficients (Γ 1). The timeperiod fixed effects represent another common factor of length T ( f2t) which changes over time but which has homogenous coefficients (Γ 2). The total number of common factor parameters to be estimated in this setting amounts to N + T. Instead of spatial and time-period fixed effects, common factors can also be measured by time-specific cross-sectional averages of the observable variables in the model or by principal components. A detailed summary of the econometric literature in this field can be found in Pesaran (2015a, Sect. 29.4). The idea to replace time-period effects by cross-sectional averages dates back to Pesaran (2006). In view of Eq. (9), he suggests more variables from the P set of cross-sectional P to use one or P averages, Yt ¼ N1 Ni¼1 Y it, Yt1 ¼ N1 Ni¼1 Y it1, and X kt ¼ N1 Ni¼1 X ikt (k = 1, . . . , K ), as common factors such that each common factor has N spatial specific estimates, one for each unit in the sample. Together with the remaining N spatial fixed effects, the total number of common factor parameters to be estimated, if all crosssectional averages are included, then increases to N+(2+K )N. Just as time-period fixed effects, these cross-sectional averages may be treated as exogenous explanatory variables based on the assumption that the contribution of each unit to the crosssectional averages at a particular point in time goes to zero if N goes to infinity (Pesaran 2006, assumption 5 and remark 3). Since the numbers of parameters to be estimated increases rapidly with the number of common factors, most empirical studies try to keep the number of cross-sectional averages to a minimum. Ciccarelli and Elhorst (2018) find that, using cigarette demand data of 69 Italian regions over the period 1877–1913, controlling for Yt and Yt1 only already effectively filters out

Spatial Panel Models and Common Factors

2149

the common time trends in the data. The authors of this paper used Matlab routines to estimate their model. However, it is also possible to estimate the parameters in Eq. (9) using the xlsme command in Stata (Belotti et al. 2017), illustrated in the textbox below. The command in Stata running a dynamic spatial panel data model with spatial fixed effects and common factor Yt reads as xsmle Y X Y1t . . . YNt wmat(W ) model(sdm) durbin(X) dlag(3) fe type(ind) effects nsim(1000). The option type(ind) controls for spatial fixed effects and the variable list Y1t . . .YNt controls for the cross-sectional average of the dependent variable with N unit-specific coefficients. Time dummies should not be included to avoid (near) perfect multicollinearity. The data structure to read in the cross-sectional averages takes the following form: Unit 1

Time 1

Unit 1 Y1

2

1

0

Unit 2 0 Y1

⋮ N

⋮ 1

⋮ 0

⋮ 0

⋮ 1

⋮ T

⋮ YT

2

T

0

⋮ 0 YT

⋮ N

⋮ T

⋮ 0

⋮ 0

... ... ...

Unit N 0 0 ⋮ Y1

...

⋮ 0

...

0

...

⋮ YT

The second possibility is to approach the unobservable common factors by one or more principal components. In that case the Γ parameters represent the factor loadings of the principal components. Shi and Lee (2017) developed a QML estimator for the model in Eq. (9), including a spatially autocorrelated error term ut = λWut + et. This estimator does not require any specification of the distribution function of the disturbance term. The coefficients estimates are bias-corrected for the Nickell bias and the impact of this bias on the other coefficients in the equation. For this purpose, a Matlab routine called SFactors has been developed, which the first author (Shi) made available at his web site www.w-shi.net. A potential disadvantage of principal components is that they are often difficult to interpret, especially if they are compared with cross-sectional averages. Every principal component requires the estimation of 2N parameters. To find out which set of common factors is able to filter out common factors most effectively, the cross-sectional dependence (CD) test developed by Pesaran (2015b) may be used. This test is based on the correlation coefficients between the time-series

2150

J. P. Elhorst

observations of each pair of units with respect to a particular variable, in this case the residuals of Eq. (9), resulting in N(N  1) correlations. Denoting these estimated correlation coefficientsp between the time-series for units i and j as κ ij, the test statistic ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi P 1 P N is defined as CD ¼ 2T =N ðN  1Þ Ni¼1 j¼iþ1 κ ij . The N(N  1) mutual correlation coefficients, each calculated over T observations, explain respectively theffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi division byffi N(N  1) and the multiplication by T in the expression p 2T =N ðN  1Þ . The number two is added since the correlation matrix is symmetric. Consequently, it is sufficient to calculate the CD statistic over the upper triangular elements of the correlation matrix only and to multiply the outcome by two so as to also represent the impact of the lower triangular elements. The null hypothesis of the CD-test assumes the existence of weak cross-sectional dependence in the data (Pesaran 2015b, Theorem 3), better known as local spatial dependence, while the alternative hypothesis reflects strong cross-sectional dependence, among which common factors. The CD statistic is a two-sided test statistic whose limiting distribution converges to the standard normal distribution, and thus 1.96 and 1.96 as critical values at the 5% significance level, provided that N goes to infinity faster than T or when T is fixed, reflecting the case in most spatial econometric studies.

5

Direct, Indirect, and Spatial Spillover Effects

Many empirical studies use point estimates of one or more spatial regression model specifications to test the hypothesis as to whether or not spatial spillovers exist. One of the key contributions of LeSage and Pace’s book (2009, p. 74) is the observation that this may lead to erroneous conclusions, and that a partial derivative interpretation of the impact from changes to the variables of different model specifications represents a more valid basis for testing this hypothesis. By rewriting the spatial econometric model with dynamic effects in space and time in Eq. (9) as Y t ¼ ðI N  ρW Þ1 ðτ þ ηW ÞY t1 þ ðI N  ρW Þ1 ðX t β þ W X t θÞ þ R

(10)

where R is a rest term covering all the remaining terms (intercept, spatial and timeperiod fixed effects, common factors, and/or the error terms), the matrix of partial derivatives of the expectation of Yt, E(Yt), with respect to the kth explanatory variable of Xt in unit 1 up to unit N can be seen to be 

 @EðY t Þ @E ðY t Þ ... ¼ ðI N  ρW Þ1 ðI N βk þ W θk Þ: @X 1kt @X Nkt

(11)

This N  N matrix of partial derivatives denotes the effects of a change of a particular explanatory variable in a particular unit on the dependent variable of all other units in the short term. Note that this N  N matrix is actually the product of

Spatial Panel Models and Common Factors

2151

two N  N matrices. The elements of the first matrix, the inverse of (I  ρW), better known as the spatial multiplier matrix, are not worked out further since a simple analytical expression for this inverse does not exist. Similarly, the long-term effects can be seen to be 

 @EðY t Þ @E ðY t Þ ... ¼ ðð1  τÞI N  ðρ þ ηÞW Þ1 ðI N βk þ W θk Þ: @X 1kt @X Nkt

(12)

LeSage and Pace (2009) and Debarsy et al. (2012) define the direct effect as the average of the diagonal elements of the matrix on the right-hand side of Eqs. (11) or (12), and the indirect effect as the average of either the row sums or the column sums of the off-diagonal elements of these matrices (since the numerical magnitudes of these two calculations of the indirect effect are the same, it does not matter which one is used). If the spatial weight matrix W does not change over time, the outcomes are independent from the time index; this explains why the right-hand sides of these equations do not contain the symbol t. A more appealing synonym for the indirect effect is spatial spillover effect, a term that therefore will be used in the remainder of this chapter. Using the expressions in Eqs. (11) and (12), it is also possible to indicate the disadvantages of certain parameter restrictions adopted in previous studies (see Elhorst 2014 for an overview). Especially the SAR, SEM and SAC models are of limited use in empirical research due to initial restrictions on the spillover effects they can potentially produce. In the SAR (θ = λ = 0) and the SAC (θ = 0) models the ratio between the spillover effect and the direct effect is the same for every explanatory variable; if this ratio happens to be p percent for one variable, it is also p percent for any other variable, both in the short and in the long term. In the SEM model the spillover effects are even set to zero by construction; if both ρ = 0 and θk = 0, short-term spillover effects do not occur, while long-term spillover effects do not occur if both ρ = η and θk = 0. This loss of flexibility makes these models less suitable for empirical research focusing on spillover effects. Only in the SLX, SDEM, SDM and GNS models can they take any value. These findings are summarized in the last column of Table 1. Finally, the disadvantage of imposing the restriction η = τρ is that the ratio between the indirect effect and the direct effect of a particular explanatory variable remains constant over time; if this ratio happens to be p percent for one variable in the short term, it is also p percent in the long term.

6

Empirical Illustration

Baltagi (2005) estimates a demand model for cigarettes based on a panel from 46 U.S. states over the period 1963–1992 in which real per capita sales of cigarettes by persons of smoking age (14 years and older) measured in packs of cigarettes per capita (Cit) is regressed on the average retail price of a pack of cigarettes measured in

2152

J. P. Elhorst

real terms (Pit) and on real per capita disposable income (Yit). Moreover, all variables are taken in logs. The dataset can be downloaded freely from www.wiley.co.uk/ baltagi/, while an adapted version of this dataset is available at www.spatial-panels. com. More details, as well as reasons to include state-specific effects (μi) and timespecific effects (ξt), are given in Baltagi (2005). The spatial weights matrix is specified as a row-normalized binary contiguity matrix whose elements are one (before row-normalization) if two states share a common border, and zero otherwise. Column (1) of Table 2 reports the estimation results when adopting a non-dynamic spatial Durbin model without spatial and time-period fixed effects, and column (2) when including these effects. To investigate whether or not the fixed effects are jointly significant, one may test the hypothesis H0: μ1 = . . . = μN = ξ1 = . . . = ξT = α, where α is the intercept of the model without fixed effects. To test this (null) hypothesis, one may perform a likelihood ratio (LR) test, which is based on the log-likelihood function values of both models. The number of degrees of freedom is equal to the number of restrictions that needs to be imposed on the fixed effects to get one overall intercept, which in this particular case is N + T  1. The outcome of this test 2(1691.4–475.5) = 2431.8 with N + T  1 = 46 + 30–1 = 75 degrees of freedom justifies the extension of the model with spatial and time-period effects. Note that one may also separately test for the inclusion of spatial fixed effects and time-period fixed effects. It is to be noted that the coefficient of any variable that does not change over time or only a little cannot be estimated when controlling for spatial fixed effects. Similarly, the coefficient of any variable that does not change across space or only a little cannot be estimated when controlling for time-period fixed effects. For many empirical studies this is a reason not to control for fixed effects, for example, because such time-invariant or space-invariant variables are the main focus of the analysis. However, if one or more relevant explanatory variables are omitted from the regression equation, when they should be included, the estimator of the coefficients of the remaining variables is biased and inconsistent. This also holds true for fixed effects and is known as the omitted regressor bias. Instead of fixed effects we can also treat μ and ξ as random effects. Hausman’s specification test can then be used to test the random effects model against the fixed effects model. However, whether the random effects model is an appropriate specification if the population may be said to be sampled exhaustively, such as all counties of a state or all regions in a country, remains controversial. A detailed discussion of this issue can be found in Elhorst (2014, Sect. 3.4). The main shortcoming of a non-dynamic spatial Durbin model is that it cannot be used to calculate short-term effect estimates of the explanatory variables. This is made clear by Table 3, which reports the corresponding effects estimates of the models presented in Table 2; since a non-dynamic model only produces long-term effects estimates, the cells reporting short-term effects estimates are left empty. The direct effects estimates of the two explanatory variables reported in column (2) of Table 3 are significantly different from zero and have the expected signs. Higher prices restrain people from smoking, while higher income levels have a positive effect on cigarette demand. The price elasticity amounts to 1.013 and the

(21.80) (14.96) (11.15) (11.09)

1.251 0.554 0.780 0.444

0.435 475.5 25.16

(11.09)

0.337

Notes: t-values in parentheses

Determinants Intercept Log(C)1 WLog(C) WLog(C)1 Log(P) Log(Y ) WLog(P) WLog(Y ) Werror R2 LogL CD-test

(1) Non-dynamic spatial Durbin model no fixed effects 2.631 (15.82)

0.902 1691.4 2.28

1.001 0.603 0.093 0.314

0.264 (24.36) (10.27) (1.13) (3.93)

(8.25)

(2) Non-dynamic spatial Durbin model with fixed effects

0.977 2623.3 0.29

0.865 0.076 0.015 0.266 0.100 0.170 0.022

(65.04) (2.00) (0.29) (13.19) (4.16) (3.66) (0.87)

(3) Dynamic spatial Durbin model with fixed effects

Table 2 Estimation results of cigarette demand using different model specifications

0.914 2699.1 2.93

0.812 0.069 0.093 0.319 0.175 0.304 0.172

(59.71) (1.99) (2.55) (14.12) (5.10) (11.22) (4.93)

(4) Dynamic spatial Durbin model with crosssectional averages

0.873 0.001 0.080 0.245 0.141 0.260 0.027 0.013 0.977 3078.5 3.08

(72.27) (0.02) (1.67) (11.22) (4.75) (6.75) (0.70) (0.19)

(5) Dynamic spatial Durbin model with principal components

Spatial Panel Models and Common Factors 2153

Notes: t-values in parentheses

Determinants Short-term direct effect Log(P) Short-term indirect effect Log(P) Short-term direct effect Log(Y ) Short-term indirect effect Log(Y ) Long-term direct effect Log(P) Long-term indirect effect Log(P) Long-term direct effect Log(Y ) Long-term indirect effect Log(Y ) (24.73) (2.26) (10.45) (2.15)

1.013 0.220 0.594 0.197

1.216 0.508 0.530 0.366 (23.39) (7.27) (15.48) (7.47)

(2) Non-dynamic spatial Durbin model with fixed effects

(1) Non-dynamic spatial Durbin model no fixed effects

Table 3 Effects estimates of cigarette demand using different model specifications (3) Dynamic spatial Durbin model with fixed effects 0.262 (11.48) 0.160 (3.49) 0.099 (3.36) 0.018 (0.45) 1.931 (9.59) 0.610 (0.98) 0.770 (3.55) 0.345 (0.48)

(4) Dynamic spatial Durbin model with cross-sectional averages 0.312 (14.65) 0.295 (11.39) 0.172 (5.13) 0.169 (4.96) 1.634 (1.74) 0.254 (0.01) 0.899 (1.51) 0.008 (0.00)

(5) Dynamic spatial Durbin model with principal components 0.312 (14.65) 0.295 (11.39) 0.172 (5.13) 0.169 (4.96) 1.634 (1.74) 0.254 (0.01) 0.899 (1.51) 0.008 (0.00)

2154 J. P. Elhorst

Spatial Panel Models and Common Factors

2155

income elasticity to 0.594. Note that these direct effects estimates are different from the coefficient estimates of 1.001 and 0.603 reported in column (2) of Table 2 due to feedback effects that arise as a result of impacts passing through neighboring states and back to the states themselves. The spatial spillover effects (indirect effects estimates) of both variables are negative and significant. Own-state price increases will restrain people not only from buying cigarettes in their own state, but to a limited extent also from buying cigarettes in neighboring states (elasticity 0.220). By contrast, whereas an income increase has a positive effect on cigarette consumption in the own state, it has a negative effect in neighboring states. We come back to this result below. Further note that the non-dynamic spatial Durbin model without spatial and timeperiod effects (column (1) of Table 3) indicates a positive rather than a negative spatial spillover effect of price increases, and that only the latter result would be consistent with Baltagi and Levin (1992), who found that price increases in a particular state – due to tax increases meant to reduce cigarette smoking and to limit the exposure of non-smokers to cigarette smoke – encourage consumers in that state to search for cheaper cigarettes in neighboring states. However, there are two reasons why this comparison is invalid. First, whereas Baltagi and Levin’s (1992) model is dynamic, it is not spatial. They consider the price of cigarettes in neighboring states, but not any other spatial interaction effects. Second, whereas this model contains spatial interaction effects, it is not (yet) dynamic. For these reasons it is interesting to consider the estimation results of the dynamic spatial panel data models. Columns (3), (4) and (5) of Table 3 reports the direct and indirect effects of the dynamic model, both in the short term and long term, based on the corresponding parameter estimates reported in columns (3), (4) and (5) of Table 2. Column (3) represents a classical dynamic spatial panel data model with spatial and timeperiod fixed effects. In column (4) the time-period fixed effects are replaced with cross-sectional averages of the dependent variable observed at time t and t  1, Yt P P ¼ N1 Ni¼1 Y it and Yt1 ¼ N1 Ni¼1 Y it1 . Other combinations of cross-sectional averages have also been investigated (15 in total), but in line with Ciccarelli and Elhorst (2018), this combination appeared to produce the greatest reduction of the CD-test statistic. In addition but in contrast to all the other model specifications, this model has been estimated in Stata to see whether this program offers an acceptable alternative to researchers who are not familiar with Matlab. In column (5), both the spatial and time-period effects are replaced by one principal component. One component does not seem like much, but note that it already involves the estimation of 2 N parameters, which is more than N + T in the classical dynamic spatial panel data model. To investigate whether the extension of the non-dynamic model to the dynamic spatial panel data model increases the explanatory power of the model, one may test whether the coefficients of the variables Yt  1 and WYt  1 are jointly significant using an LR-test. The outcome of this test for the model in column (3) with the lowest log-likelihood function value of the three dynamic models amounts to

2156

J. P. Elhorst

2(2623.3–1691.4) = 1863.8 with 2 degree of freedom, and evidently justifies the extension of the model with dynamic effects. Another way to look at this are the results of the CD-test statistic. When testing for cross-sectional dependence based on the consumption patterns of the 46 states over the period 1963–1992, this statistic amounts to 101.52 with an average pairwise correlation coefficient of 0.58. This outcome is highly statistically significant, indicating that cross-sectional dependence needs to be accounted for. When controlling for endogenous and exogenous interaction effects this statistic drops down to 25.16 (column (2) of Table 2), and when additionally controlling for fixed effects in space and time it drops further down to 2.28 (column (3) of Table 2). However, only when also controlling for habit persistence, the coefficient estimate of the variable Yt  1 amounts to 0.865 and appears to be highly significicant, does the CD test statistic take a value within the interval [1.96,+1.96] (0.29, see column (3) of Table 2). It indicates that only this dynamic model is able to effectively factor out the observed cross-sectional dependence in the data. It is tempting to conclude that the estimation results of the other two dynamic models are not much different, but when taking a closer look it is striking that especially the coefficient estimates of the spatial interaction effects in Table 2 are rather different. Whereas the spatial autoregressive parameter ρ of WYt is significant when controlling for spatial and time-period fixed effects or cross-sectional averages, it is not and almost zero when controlling for principal components. Conversely, whereas the spatiotemporal coefficient η of WYt  1 is significant when controlling for cross-sectional averages and weakly significant (10% level) when controlling for principal components, it is not when controlling for spatial and time-period fixed effects. Although the spatial interaction coefficient of WLog(P) is significant in all three models, its magnitude ranges from 0.170 to 0.260. Finally, the spatial interaction coefficient of WLog(Y) is only significant in the model controlling for cross-sectional averages. In line with this, also the short and long term direct and spatial spillover effects and their significance levels, reported in Table 3, are rather different. The explanation is that the short-term direct and spatial spillover effects in Eq. (11) depend on three parameters, among which two are spatial interaction effects, and their long term counterparts in Eq. (12) on five parameters, among which three spatial interaction effects. Any change in the magnitude and the significance levels of these coefficients, therefore works through in the effects estimates derived from them. This explains why, for example, the long-term indirect effect of income of the last model in Table 3 is relatively high compared to the other two dynamic models: the sum of the two parameters in the numerator of Eq. (12) of this model is relatively large (β + θ = 0.141 – 0.027 = 0.114), and the sum of the parameters in the denominator relatively small (1  τ  ρ  η = 1 – 0.873  0.001  0.080 = 0.046). Since the dynamic model with spatial and time-period fixed effects appears to be the only model that effectively filtered out common time trends, i.e., only for this model does the CD-test applied to the residuals of Eq. (9) takes a value within the interval [1.96,+1.96], the focus below will be on the effects estimates produced by this model. It should be noted, however, that two other recent studies found different results. Ciccarelli and Elhorst (2018) find that the dynamic model with cross-sectional averages outperforms its counterpart with spatial and time-

Spatial Panel Models and Common Factors

2157

period fixed effects. Similarly, Elhorst et al. (2018) find that the dynamic model with principal components outperforms its counterpart with spatial and timeperiod fixed effects. The conclusion must be that the best model to control for common time trends might differ from one empirical study to another. Since the log-likelihood function values of the models in Table 2 with cross-sectional averages and common factors exceed that of the model with spatial and timeperiod fixed effects, while their CD-test statistics still take values outside the interval [1.96,+1.96], another reading is that the spatial weight matrix is not correctly specified. Using the same data set, Halleck Vega and Elhorst (2015) parameterize this matrix and find that it is denser than the binary contiguity matrix, while Kelejian and Piras (2014) find that this matrix is endogenous. It indicates that the log-likelihood function values of the models with crosssectional averages and common factors might increase even further when a parameterized and/or an endogenous spatial weight matrix is adopted. This is an interesting issue for further research. Consistent with microeconomic theory, the short-term direct effects appear to substantially smaller than the long-term direct effects; 0.262 versus 1.931 for the price variable and 0.099 versus 0.770 for the income variable. This is because it takes time before price and income changes fully settle. The long-term direct effects in the dynamic spatial Durbin model, on their turn, appear to be greater (in absolute value) than their counterparts in the non-dynamic spatial Durbin model: 1.931 versus 1.013 for the price variable and 0.770 versus 0.594 for the income variable. Apparently, the non-dynamic model underestimates the long-term effects. The shortterm spatial spillover effect of a price increase turns out to be positive; the elasticity amounts to 0.160 and is highly significant (t-value 3.49). This finding is in line with the original finding of Baltagi and Levin (1992) in that a price increase in one state encourages consumers to search for cheaper cigarettes in neighboring states. The positive spatial spillover effect of a price increase we found earlier for the non-dynamic spatial Durbin model demonstrates that a non-dynamic approach falls short here. Although greater and again positive, we do not find empirical evidence that the long-term price spatial spillover effect is also significant. A similar result is found by Debarsy et al. (2012). It is to be noted that they estimate the parameters of the model by the Bayesian MCMC estimator developed by Parent and LeSage (2010), whereas we use the consistent bias-corrected ML estimator developed by Lee and Yu (2015). Furthermore, the spatial weights matrix used in that study is based on lengths of state borders in common between each state and its neighboring states, whereas we use a binary contiguity matrix. The long-term spatial spillover effect of the income variable derived from the dynamic spatial panel data model appears to be positive, which suggests that an income increase in a particular state has a positive effect on smoking not only in that state itself, but also in neighboring states. Furthermore, it is smaller than the direct effect, which makes sense since the impact of a change will most likely be larger in the place that instigated the change. However, the income spatial spillover effect is not significant. A similar result is found by Debarsy et al. (2012). Interestingly, the spatial spillover effect of the income variable in the non-dynamic spatial panel data model appeared to be negative and significant. Apparently, the decision whether to

2158

J. P. Elhorst

adopt a dynamic or a non-dynamic model again represents an important issue. Some researchers prefer simpler models to more complex ones (Occam’s razor). One problem of complex models is overfitting, the fact that excessively complex models are affected by statistical noise, whereas simpler models may capture the underlying process better and may thus have better predictive performance. However, if one can trade simplicity for increased explanatory power, the complex model is more likely to be the correct one.

7

Conclusion

Spatial econometric models that include lags of the dependent variable and of the independent variables in both space and time provide a useful tool to quantify the magnitude of direct and indirect effects, both in the short term and long term. A demand model for cigarettes based on panel data from 46 U.S. states over the period 1963 to 1992 is used to empirically illustrate this. Direct effects should be used to test the hypothesis as to whether a particular variable has a significant effect on the dependent variable in its own economy rather than the coefficient estimate of that variable. Similarly, indirect effects should be used to test whether or not spatial spillovers exist rather than the coefficient estimate of the spatially lagged dependent variable and/or the coefficients estimates of the spatially lagged independent variables. One difficulty is that it cannot be seen from the coefficient estimates and the corresponding standard errors or t-values (derived from the variance-covariance matrix) whether the direct and indirect effects in a spatial econometric model are significant. This is because these effects are composed of different coefficient estimates according to complex mathematical formulas and the dispersion of these effects depends on the dispersion of all coefficient estimates involved. However, due to the availability of software written in Stata, R and Matlab, they are no major obstacles any more for applied researchers to report and discuss direct and indirect effects estimates in addition to the point estimates of the parameters of the model. In addition, there are no obstacles any more to control for common factors other than time-period fixed effects, including applied researchers familiar with Stata only.

8

Cross-References

▶ Bayesian MCMC Estimation ▶ Cross-Section Spatial Regression Models ▶ Instrumental Variables/Method of Moments Estimation ▶ Interpreting Spatial Econometric Models ▶ Maximum Likelihood Estimation

Spatial Panel Models and Common Factors

2159

References Allers MA, Elhorst JP (2005) Tax mimicking and yardstick competition among governments in the Netherlands. Int Tax Public Financ 12(4):493–513 Anselin L (1988) Spatial econometrics: methods and models. Kluwer, Dordrecht Anselin L, Le Gallo J, Jayet H (2008) Spatial panel econometrics. In: Mátyás L, Sevestre P (eds) The econometrics of panel data, fundamentals and recent developments in theory and practice, 3rd edn. Kluwer, Dordrecht, pp 627–662 Baltagi BH (2005) Econometric analysis of panel data, 3rd edn. Wiley, Chichester Baltagi BH, Levin D (1992) Cigarette taxation: raising revenues and reducing consumption. Struct Chang Econ Dyn 3(2):321–335 Belotti F, Hughes G, Mortari AP (2017) Spatial panel-data models using Stata. Stata J 17(1): 139–180 Burridge P, Elhorst JP, Zigova K (2017) Group interaction in research and the use of general nesting spatial models. In: Baltagi BH, LeSage JP, Pace RK (eds) Spatial econometrics: qualitative and limited dependent variables, Advances in econometrics, vol 37. Emerald, Bingley, pp 223–258 Ciccarelli C, Elhorst JP (2018) A dynamic spatial econometric diffusion model with common factors: the rise and spread of cigarette consumption in Italy. Reg Sci Urban Econ 72:131–142 Debarsy N, Ertur C, LeSage JP (2012) Interpreting dynamic space-time panel data models. Stat Methodol 9(1–2):158–171 Elhorst JP (2010) Dynamic panels with endogenous interaction effects when T is small. Reg Sci Urban Econ 40(5):272–282 Elhorst JP (2014) Spatial econometrics: from cross-sectional data to spatial panels. Springer, Heidelberg/New York/Dordrecht/London Elhorst JP, Madre J-L, Pirotte A (2018) Car traffic, habit persistence, cross-sectional dependence, and spatial heterogeneity: new Insights using French departmental data. University of Groningen, working paper under review Ertur C, Koch W (2007) Growth, technological interdependence and spatial externalities: theory and evidence. J Appl Econ 22(6):1033–1062 Halleck Vega S, Elhorst JP (2015) The SLX model. J Reg Sci 55(3):339–363 Kelejian HH, Piras G (2014) Estimation of spatial models with endogenous weighting matrices, and an application to a demand model for cigarettes. Reg Sci Urban Econ 46:140–149 Kelejian HH, Piras G (2017) Spatial econometrics. Elsevier, London Lee LF, Yu J (2015) Spatial panel data models. In: Baltagi BH (ed) The Oxford handbook of panel data. University Press, Oxford, pp 363–401 Lee LF, Yu J (2016) Identification of spatial Durbin models. J Appl Econ 31(1):133–162 LeSage JP, Pace RK (2009) Introduction to spatial econometrics. CRC Press/Taylor & Francis Group, Boca Raton Parent O, LeSage JP (2010) A spatial dynamic panel model with random effects applied to commuting times. Transp Res B 44(5):633–645 Pesaran MH (2006) Estimation and inferences in large heterogenous panels with a multifactor structure. Econometrica 74(4):967–1012 Pesaran MH (2015a) Time series and panel data econometrics. Oxford University Press, Oxford Pesaran MH (2015b) Testing weak cross-sectional dependence in large panels. Econ Rev 34(6–10):1089–1116 Shi W, Lee LF (2017) Spatial dynamic panel data model with interactive fixed effects. J Econ 197(2):323–347

Limited and Censored Dependent Variable Models Xiaokun (Cara) Wang

Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Limited and Censored Variable Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Models for Discrete Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Models for Censored and Truncated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Models Incorporating Spatial Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Geographically Weighted Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Spatial Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Spatial Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Estimation Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Maximum Simulated Likelihood Estimation (MSLE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Composite Marginal Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2162 2163 2163 2165 2166 2166 2167 2168 2169 2170 2171 2173 2175 2176

Abstract

In regional science, many attributes, either social or natural, can be categorical. For example, choices of travel mode, presidential election outcomes, or quality of life can all be measured (and/or coded) as discrete responses, dependent on various influential factors. Some attributes, although continuous, are subject to truncation or censoring. For example, household income, when reported, tends to be censored, and only boundary values of a range are obtained. Such categorical and censored variables can be analyzed using econometric models that are established based on the concept of “unobserved/latent dependent variable.”

X. Wang (*) Department of Civil and Environmental Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA e-mail: [email protected] © Springer-Verlag GmbH Germany, part of Springer Nature 2021 M. M. Fischer, P. Nijkamp (eds.), Handbook of Regional Science, https://doi.org/10.1007/978-3-662-60723-7_92

2161

2162

X. Wang

The previous examples also share another common feature: when data is collected in a spatial setting, they are all inevitably influenced by spatial effects, either spatial variation or spatial interaction. In contrast to panel data or timeseries data, such variation or dependencies are two-dimensional, making it even more complicated. The need for investigating such limited and censored variables in a spatial context compels the quest for rigorous statistical methods. This chapter introduces existing methods that are developed to analyze limited and censored dependent variables while considering the spatial effects. Different model specifications are discussed, with an emphasis on discrete response models and censored data models. Different types of spatial effects and corresponding ways to address them are then discussed. In general, when the spatial variation is of major concern, geographically weighted regression is preferred. When the spatial dependency is the primary interest, spatial filtering and spatial regression should be chosen. Techniques popularly used to estimate spatial limited variable models, including maximum simulated likelihood estimation, composite marginal likelihood estimation, and Bayesian approach, are also introduced and briefly compared. Keywords

Discrete responses · Censored and truncated data · Incorporating spatial effects · Estimation approaches

1

Introduction

In studies of social behaviors and human activities, many attributes involve categorical, truncated, or censored responses in a spatial context. For example, choices of travel mode, choices of occupation, and presidential election outcomes can all be measured (and/or coded) as discrete responses, dependent on various influential factors. Household income and pavement surface deterioration levels, when reported, tend to be censored or truncated. Such categorical and/or censored variables can be analyzed using the limited dependent variable models. The previous examples also share a common feature: they all exhibit some type of spatial effects, either spatial variation or some degree of spatial dependence. For example, in studies of ecology, wealth, vehicle crashes, and epidemics, it is known that the data generation processes often vary over space. Another example is that even after controlling for household attributes, choice of travel mode is still expected to exhibit positive spatial correlations. Such correlation patterns can be partly explained by proximity because, in reality, there are always influential factors that cannot be controlled (e.g., pedestrian friendliness of all neighborhoods). The sign and magnitude of such dependence tend to vary rather gradually over space. Most likely, correlation diminishes with increases in distance between any two observation units. And in a spatial context, in contrast to time-series data, such dependencies are two-dimensional – which adds complexity. The widespread nature of such

Limited and Censored Dependent Variable Models

2163

phenomena and the need for understanding these behaviors compel the quest for rigorous statistical methods for analysis of such data. However, the handling of limited and censored dependent variables already involves specification and estimation of nonlinear models. Considering spatial effects implies that the models have to further account for two-dimensional dependence structures across a large number of observations, leading to manipulation of high-dimensional multivariate distributions and large matrices. In recent years, many studies have attempted to enhance the behavioral consistency of model specification and the efficiency of estimation. This chapter will introduce the related methods developed to date. The following sections will first introduce conventional econometric models used for limited and censored dependent variables, followed by a discussion of spatial models, that is, how the consideration of spatial effects can be incorporated. Estimation techniques, which are critical for the application of these models, will be discussed in the end.

2

Limited and Censored Variable Models

Models for limited and censored variables are an important subarea of econometrics. This section will explain the specification of these models in two main categories: those for discrete responses and those for censored/truncated data.

2.1

Models for Discrete Responses

Models for discrete responses are often used to model choices among sets of alternatives rather than a continuous response (Greene 2002). Such models play an important role in scientific studies, both social and natural. The specification of discrete response models tends to require specific assumptions on the error term distribution. Two commonly used specifications are probit and logit models. The most basic form is a binary response, where the value of the dependent variable is either zero or one, indicating no or yes: 0 y! i ¼ X i β þ ei

 and

yi ¼

1 0

if if

y! i > 0 y! i  0

i ¼ 1, 2, . . . , N

(1)

where i indexes observations ði ¼ 1, 2, . . . , N Þ, y! i is a latent (unobserved) dependent variable for individual i, and yi is the observed dependent variable. Xi is a Q  1 vector of explanatory variables, and β is the set of corresponding parameters. ei stands for unobservable factors for observation i and is assumed to follow an identically and independently distributed (iid) standard normal distribution for a probit model or Gumbel distribution for a logit model. In other words, the actual response that is observed is a nonlinear function of latent response, which can be expressed as a linear function of explanatory variables. With this model setting, it is straightforward to show that

2164

X. Wang

 0      0 Prðyi ¼ 1j X i Þ ¼ Pr y! i  > 0jX i ¼ Pr ei > Xi βj X i ¼ F Xi β Prðyi ¼ 0j X i Þ ¼ 1  F Xi0 β

(2)

where F() is the cumulative distribution function (CDF) of error term ei. The log-likelihood is thus ln L ¼

N  X

     yi lnF Xi0 β þ ð1  yi Þ ln 1  F Xi0 β :

(3)

i¼1

Of course, in many circumstances, the number of alternatives is more than two. If the data are ordered and each unit makes a choice from among the S alternatives, the model specification can be naturally extended from the binary choice setting, with a set of threshold parameters to distinguish different levels of response (alternatives): yi ¼ k

if

γk < y! i < γ kþ1

k ¼ 1, 2, . . . , S:

(4)

To some extent, the observed variable can be considered as a censored form of the latent variable. The latent variable y! i varies continuously, but the observed response is censored by unknown boundaries γ 1 < γ 2 < . . . γ s+1, leading to one of the integer responses 1, 2, . . . , S. The probability for each outcome is     Prðyi ¼ 1j X i Þ ¼ F γ 2  X 0i β  F γ 1  X 0i β Prðyi ¼ 2j X i Þ ¼ F γ 3  X 0i β  F γ 2  X 0i β ⋮     Prðyi ¼ Sj X i Þ ¼ F γ Sþ1  X 0i β  F γ S  X 0i β

(5)

where F() is still the CDF of error term ei, a standard normal distribution in probit model and logistic in logit model. When the data is multinomial and unordered, a common model specification is established based on the utility maximization theory introduced by McFadden (1980). In this framework, the alternative offering the maximum utility is chosen. If Uik indicates utility for individual i to select alternative k (k = 1, 2, . . . , S), the observed dependent variable for observation i, yi, will take value m if and only if Uim is the maximum utility among all alternatives (i.e., the most attractive option). Furthermore, one response alternative is often chosen as the “base” since preference is always a relative term. If the last alternative S is used as the base, the latent utility difference can be expressed as y! ik ¼ U ik  U is

k ¼ 1, 2, . . . , S  1:

(6)

Similarly, the latent utility difference is influenced by many factors, so 0 y! ik ¼ X i β þ eik

and

yi ¼ m

if

! ! y! im > 0 and yim  yik

k ¼ 1, 2, . . . , S  1

(7)

Limited and Censored Dependent Variable Models

2165

X 0ik is a 1  Q vector indicating the differences of explanatory variable values between alternative k and base alternative S. Subscript k implies that X 0ik can be alternative specific (such as cost of different modes in the analysis of travel mode choice). Conventional models used for analyzing unordered categorical data are multinomial logit or multinomial probit models. When the iid assumption is potentially violated, there are other derived forms to deal with the correlated errors, for example, the nested logit model which requires prespecified error correlation structure and the random parameter (mixed) logit (probit) models that assume parameters follow random distributions.

2.2

Models for Censored and Truncated Data

A sample is considered “truncated” when it is only a subset of a larger population. For example, when studies on expenditures are based on observations with positive expenditures, those with no expenditures are “truncated.” A similar and more common problem is “censoring,” meaning that rather than observing the exact value, only the boundary value of a range is observed. With the previous example, if expenditure over $10,000 is coded as $10,000, it is censored from above. Censored and truncated data are not representative of the population, and estimators that ignore this problem will be inconsistent, leading to incorrect marginal effects (Greene 2002). To some extent, truncated and censored regressions are similar to binary and ordered-response models. A latent dependent variable is posited taking 0 the form y! i ¼ X i β þ ei , then for truncated (from below at zero) regression, the observed dependent variable will be yi ¼ y! i

if

y! i > 0:

(8)

If ei follows a normal distribution with zero mean and variance σ 2, it can be shown that the conditional mean of the dependent variable is    0    0 0 0 E yi j y! i > 0 ¼ E X i β þ ei j X i β þ ei > 0 ¼ X i β þ E ei j ei > X i β   ϕ X 0i β=σ 0  ¼ Xi β þ σ  0 Φ X i β=σ

(9)

where ϕ() and Φ() are the probability density function (PDF) and cumulative distribution function (CDF) of a standard normal distribution, respectively. In other words, because of the truncation, the mean of the dependent variable is no longer a linear function of X 0i . For censored (also from below at zero) regression, or Tobit model, the observed dependent variable will be

2166

X. Wang

 yi ¼

y! i 0

if if

y! i > 0 y! i  0:

(10)

 !   !  The censored conditional mean is thus E y ð j X Þ ¼ Pr y  0 0 þ Pr y > 0 E i i i i   yi j y! i > 0 , and with results for the truncated mean, it can be shown that  ! 0

ϕ X 0i β=σ Xi β 0  E ð yi j X i Þ ¼ 0 þ Φ X iβ þ σ  0 σ Φ X i β=σ ¼Φ

0

X 0i β X β X 0i β þ σϕ i : σ σ

(11)

The above derivation can be easily extended to other truncation/censoring threshold values and the double truncation/censoring situation. As these conditional means are nonlinear, ordinary least square (OLS) will no longer yield consistent estimates of β. A rich body of literature related to generalization of censored models can be found in discussions of sample selection models and treatment effects models. It should be mentioned that, to some extent, duration models and count data models can be also considered as members of the limited dependent variable model family. Although their specification forms differ significantly from the previously discussed models, the notion behind their model specification is similar. If further investigation of these models is of interest, readers are referred to work by Greene (2002) which provides more in-depth discussion of these models.

3

Models Incorporating Spatial Effects

Methods for dealing with spatial effects in limited and censored dependent variable models can be categorized into the three types. The first method is geographically weighted regression (GWR), where the consideration is mainly about the spatial variation of behavioral parameters. The second method, spatial filtering, has been applied more broadly. This approach essentially attempts to “filter” spatial effect by explicitly controlling for variables that represent the spatial dependency. The third method is spatial regression, which incorporates spatial effects explicitly in the model specification, either through the spatial autocorrelation of (dependent and independent) variables, the autocorrelation of error terms, or both. This section introduces the basic concepts of these methods and discusses how these approaches can be integrated into the limited and censored dependent variable models.

3.1

Geographically Weighted Regression

GWR is established based on the assumption that relationships between variables vary from location to location; therefore, parameters should exhibit significant spatial

Limited and Censored Dependent Variable Models

2167

variation. The flexible specification of GWR can be used to examine the stability and robustness of parameter estimates over space. The formulation of GWR is fairly straightforward: Instead of a global regression that implies one data generation process dominates the whole population, the model is now localized by allowing for one unique data generation process per observation. Taking a binary model, for example, instead of a 0 universal regression model for the latent variable y! i ¼ X i β þ ei, , the parameters are allowed to be local, that is, yi ¼

X 0i

 βðui , vi Þ þ ei

and

yi ¼

1 0

if if

y! i >0 y! i 0

i ¼ 1, 2, . . . , N (12)

where (ui, vi) indicate the coordinates of observation i and β(ui, vi) is a continuous function of the map coordinates. The key advantage of GWR is that it explicitly allows for local spatial effects in relatively standard regression models (Fotheringham 2003). Still using the binary choice model as the example, with GWR, the log-likelihood for the jth observation will be ln Lj ¼

N  X

       wji yi ln F X 0i βj þ ð1  yi Þ ln 1  F X 0i βj

(13)

i¼1

where wji is the weight for the ith data point with respect to the jth regression point, normally higher for data points close to the jth regression point and decays over distance. Comparing this expression with Eq. (3), it can be observed that the key differences are that each observation now has its own parameter values and that the regression is influenced more by data points nearby. Selecting weight function, or kernel, to define wji is often the most challenging step of GWR. The main considerations are the representation of point proximity and selection of bandwidth distance (or the cutoff distance over which the data points no longer influence the regression). Fotheringham (2003) describes a variety of weight specification alternatives, with Gaussian weights and their bi-square variation as the most commonly used options. The integration of GWR into limited dependent variable models is straightforward and has been applied in many studies. LeSage (1999) provided MATLAB code for estimating binary logit and probit GWR models, using crime data as an illustration. Atkinson et al. (2003) used a binary logit GWR model to identify relationships between geomorphological controls and riverbank erosion. McMillan and McDonald (1999) extended the use of GWR into multinomial discrete response analysis by specifying a multinomial logit GWR model to analyze the influence of transportation access on land use in Chicago. Luo and Wei (2009) analyzed land-use conversion (from barren, crop/grassland, forest, and water uses to urban land use) via a multinomial logit GWR model.

3.2

Spatial Filtering

When spatial interaction, rather than spatial variation, is the major concern, the approaches used to address the spatial dependency include spatial filtering and spatial regression. “Spatial filtering” has many different definitions in existing literature. The

2168

X. Wang

most unrestricted definition is to simply construct and control for some spatial variables so that conventional statistical models based on uncorrelated errors could still apply. For example, Dugundji and Walker (2005) considered spatial network independencies in their mixed logit model when studying mode choice behavior. In most recent studies, “spatial filtering” is referred as the semiparametric approaches that separate spatial dependencies by dividing original variables into filtered nonspatial variables and spatial variables. The division is often achieved using local spatial statistics such as distancebased eigenvector procedure (Dray et al. 2006), G-statistics-based approach (Getis 1995), and eigenfunction-based procedure (Griffith 2000). The local spatial statistics such as Getis’ Gi and Moran’s I were originally developed as diagnostics to disclose local spatial dependencies that are not indicated by their global counterparts. For example, Getis’ Gi is established based on an indicator Gi(d), which is essentially a weighted average of observation values around observation i: Gi ðd Þ ¼

Σ j wij xj Σ j xj

all j 6¼ i

(14)

where wij is the element of a row-standardized geographic connectivity matrix W and d indicates the distance or other predefined connectivity index. Getis (1995) then used the difference between observed value Gi(d) and expected value E(Gi), to separate spatial from nonspatial effects. In other words, observed variables were filtered as

x! i

Σ j wij xi E ðGi Þ N 1 ¼ : ¼ xi Gi ðd Þ Gi ðd Þ

(15)

After all variables (both dependent and explanatory variables) have been filtered by such procedure, and the spatial dependency is considered removed, the conventional models introduced previously can be used directly for data analysis. Other spatial filtering techniques use different approaches to filter the spatial effects, but they all rely on the construction and manipulation of a spatial weight matrix W, which is used to represent the spatial dependency structure. The key challenge for this approach is to choose proper regional weighting scheme, or the construction of W.

3.3

Spatial Regression

Similar to spatial filtering, spatial regression models also directly address the spatial dependencies by incorporating spatial effects in the model specification. The key difference between spatial regression and spatial filtering is that, although spatial regression may also use spatially lagged explanatory and/or response variables, as in the spatial filter models, many of the variables are treated as endogenous. LeSage and Pace (2009) summarized the motivations of using spatial regression models for data analysis. There is a big family of spatial regression models, including spatial autoregressive (SAR), spatial moving average (SMA), spatial Durbin (SDM), and spatial

Limited and Censored Dependent Variable Models

2169

error (SEM), with SAR and SEM as the most commonly used specifications. Many existing works (Anselin 2003; LeSage and Pace 2009) have provided extensive technical discussions on these models and the underlying spatial stochastic processes. In general, SAR and SEM are used for dependent variables and error terms, respectively. Both are used rather regularly by researchers, thanks to their flexibility and applicability. The former case is also called “spatial lag,” while the latter is often called “spatial error.” By using these two specifications, it is assumed that the spatial process follows a recursive pattern. In a linear model setting, SAR is expressed as   y ¼ ρWy þ X β þ e or y ¼ ðI  ρW ÞX β þ ðI  ρW Þe, e  N 0, σ 2 I N

(16)

where W still denotes the geographic connectivity matrix, ρ is the spatial coefficient, representing the magnitude of overall neighborhood influence, and In is an N -by-N identity matrix. The SEM incorporates the spatial effects in error terms: y ¼ Xβ þ u

u ¼ λu þ e

  e  N 0, σ 2 I N

(17)

where u is the vector of overall errors, λ is the spatial coefficient for error terms, indicating the contribution of neighboring observations on each other’s uncertainty, and e now indicates the part of error or uncertainty caused by each observation itself. These spatial processes can be easily applied in the context of limited and censored dependent variable models by incorporating them in the formulation of latent dependent variables. For example, a SAR probit model is simply y! ¼ ðI  ρW ÞX β þ ðI  ρW Þe ( 1 if y! > 0 y¼ 0 if y!  0:

  e  N 0, σ 2 I N and still (18)

Many studies have used spatial regression models to analyze limited and censored dependent variables. For example, Beron and Vijverberg (1999) specified probit models with both spatial errors and spatial lags. Smith and LeSage (2004) incorporated a regional effect in a probit model and used Bayesian techniques to analyze the 1996 presidential election results. LeSage and Pace (2009) discussed specification of a spatial autoregressive multinomial probit model. Wang and Kockelman (2009) developed a spatial ordered probit model with temporal correlation, and Wang et al. (2012) further extended the models to the analysis of multinomial, unordered responses.

4

Estimation Approaches

A common approach for estimating limited and censored dependent variable models is the maximum likelihood estimation (MLE) technique (see chapter ▶ “Maximum Likelihood Estimation”). When the models are further complicated

2170

X. Wang

by the consideration of spatial dependency, implying interdependence of observations, the joint distribution of the entire sample is no longer the product of their marginal distributions; hence, the log-likelihood is no longer additively separable one-dimensional probabilities. The calculation of high-dimensional distribution requires the manipulation of large matrices and a high-dimensional integral. The MLE approach thus becomes ineffective facing such heavy, sometimes impossible computational burdens. Alternative estimation approaches have been explored by researchers. For example, Pinkse and Slade (1998) used the generalized method of moments (GMM) to estimate a probit model with spatial error components (see chapter ▶ “Instrumental Variables/Method of Moments Estimation” for details concerning estimation). Klier and McMillen (2008) used GMM to estimate a spatial logit model for analyzing the clustering of auto supplier plants in the USA. McMillen (1995) used simulated likelihood strategies to estimate spatial multinomial probit models. Vijverberg (1997) used recursive importance sampling (RIS) to approximate the n-dimensional log-likelihood in a spatial probit model. Bhat (2011) suggested a maximum approximate composite marginal likelihood (MACML) method, which essentially decomposes multidimensional autocorrelation into pairwise correlation. Smith and LeSage (2004), LeSage and Pace (2009), and Wang and Kockelman (2009) used Bayesian framework in their studies of spatial discrete response models. In general, the estimation approaches discussed in existing literature can be categorized into four types: maximum simulated likelihood estimation (MSLE), GMM, MACML, and Bayesian techniques. Among them, the use of GMM is relatively limited because it requires orthogonality, which cannot be conveniently derived in multiple-response models. This section will briefly introduce the other three estimation techniques that are used more dominantly in practice: MSLE, MACML, and Bayesian techniques.

4.1

Maximum Simulated Likelihood Estimation (MSLE)

The notion of MSLE is that, since the direct derivation of complex statistical models is impractical (e.g., when a likelihood function involves a multidimensional integral), a simulated likelihood that approximates the original likelihood is used instead. When approximated appropriately, the key model features in the original model are retained, but computational burden is alleviated. For example, in a SAR multinomial probit model where the probability involves multivariate normal cumulative density function PrðV < vÞ ¼ ΦðV1 < v, V2 < v2 , V2 < v3 , . . . , V1 < v1 Þ

(19)

where V is a vector of joint events and v is the corresponding thresholds. Φ() still indicates the CDF of a normal distribution. The joint probability can be decomposed into the product of conditional densities:

Limited and Censored Dependent Variable Models

PrðV < vÞ ¼

ð v‘1

ð v1 1

1



ð v1 1

2171

ϕðV ‘ j V i